Lp >. 





‘ - , A aE ee ee a 





Yidvodotery watahosiilp (tres Saae 
dues payment of snembers un- 
def 30 years old)........... 


Ragulat menbarbip 0 $s ing 


becription rate, $8 per year. Prices iti lets sictlelds ws! seaoens ‘ 
cumulative Index hg Volumes 1-84, elena pre be ares from the 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


VoLUME 48 SEPTEMBER 1953 NuMBER 263 








ARTICLES 
Measuring the Accuracy and Structure of Businessmen’s Expectations 
‘ . RoBert FERBER 
Electronic Computation .n Economic Statistics ‘ 
. J. A. C. Brown, H. 8. HourHakkER, AND 8. J. Prats 
The Elements of an Industrial Classification Policy . Watr R. Simmons 
Experimental Designs and — Sampling in Marketing Research . 
BRUNK AND WatrTer T. FEDERER 
Improving National Marriage and Divorce Statistics . HucH CarTER 
Sampling the Federal Old-Age and Survivors Insurance Records ._ . 
Se ew ee me ore 
Statistics in Chemical Experimentation ° . C. DANIEL 
Life Testing . BENJAMIN Epstein AND MILTON SOBEL 
Methods of Measuring Useful Life of Equipment Under Operational Con- 
ditions . Leo A. GoopMAN 
On Some Procedures for the ‘Rejection ‘of Suspected Data . E. P. Kine 
On the Use of Ranges, Cross-Ranges and Extremes in Comparing Small 
Samples . . Hannes HyreENIvs 
The Distribution of the Product of Ranges i in Samples from a Rectangular 
Population . . Paut R. Rwer 
Confidence and Tolerance Intervals for the Normal Distribution . 
FRANK PRroscHaN 
A Statistically Precise and Relatively ‘Simple Method of Estimating the 
Bio-Assay with Quantal Response, Based on the Logistic Function 
‘ JOSEPH BERKSON 
Critical Values of the “‘Log-N ormal Distribution . . . Jack MosHMaN 
The Partition of Error in Randomized Blocks 
O. Kempruorne anp W. D. Barcay 
Summaries of Papers at the 112th Annual Meeting . «om 


BOOK REVIEWS 
Moroney, M. J., Facts from Figures . , M. A. GirsHIcK 
CLARK, CHARLES E., An Introduction to Statistics . Z. 8. Mauinowsk1 
Rao, C. me, Advanced Statistical Methods in Biometric Research ? 
RosepiITH SITGREAVES 
Gore, W. ‘. Statistical Methods for Chemical ‘Experimentation ; 
. _ C. Danren 
TINTNER, Geruarp, Econometrics . Danret B. Surrs 
JAMBUNATHAN, M. V., The Theory of Linear Estimation 
Wituram G. Mapow 
McKinsey, J.C. C., Introduction to the Theory of Games . 
IRwINn Bross 


Coomss, CiypE H., A Theory of Psychological Scaling . Bert F. Green 
MUELLER, RosertT Kirk, Effective nes through Probability Con- 
trols: How to Calculate anagertal a 

(Two Reviews) .J. C. Bain AND Paut 8. OLMSTEAD 

KissELGorr, Avram, Factors Affecting the Demand for Consumer Install- 

ment Credit . . GeorceE E. O’Rourke 
Hatcrow, Haro.p G., Agricultural Policy of the United States 

: Ivan M. Lee 

InsTITUTE ror Economic RESEARCH, Toyo SPINNING Co., Causes of De- 

cline in the World’s Cotton Textile Trade . Karu A. Fox 

Betcuer, JoHN C., AND SHARP, Emit F., A Short Scale for Measuring 

Farm Level of Living: A Modification of Sewell’s nase conomic Scale 

Frep L. StropTBEcK 

McEnring, “Davis, The Labor Force in California: A Study of Characteristics 

in Labor Force, Employment and Occupations in California, 1900-1950 

. Guapys L. PALMER 

Monanan, THomas P,, The Pattern of Age at M Marriage in the United States, 

Vols. I and II ; . Paut H. Jacopson 

Asusy, W. Ross, Design for @ Brain . . . . . A. 8. HouseHoLDER 

Ranpom Diarrs : ik ~~ * & = eo oe 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


The Editors welcome the submission of manuscripts for possible publication. 
They should be typewritten entirely double-spaced, including footnotes, and 
two copies should be sent to the Editor, W. Allen Wallis, 207 Haskell Hall, 
University of Chicago, Chicago 37. Books for review should be sent to the same 
address. Unsolicited book reviews are not accepted, but suggestions of titles for 
review are welcome. 


EDITOR 


W. ALLEN Wa tuts, University of Chicago 
ASSISTANT TO THE Epiror: MARGARET A. LABADIE 


ASSOCIATE EDITORS 


Howarp L. JoNEs Pup J. McCartuy 

Illinois Bell Telephone Co. Cornell University 
GEorGE M. KuzNets I. RicHarpD SAVAGE 

University of California National Bureau of Standards 
Wiiuiam G. Mapow C. ASHLEY WRIGHT 

University of Illinois Standard Oil Company (N.J.) 


ADVISORY PANEL OF FORMER EDITORS 


WiiuraM G. Cocuran (1945-50) Frank A. Ross (1926-34, 41-45) 
Johns Hopkins University Thetford, Vermont 

WituraM F, Ossorn (1920-1925) Freperickx F. StepHan (1935-40) 
University of Chicago Princeton University 


Errata: Readers and authors are urged to submit to the Editor notices of 
errors found in this or any previous issue. These will be published once 
a year, in the December issue. 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Number 263 SEPTEMBER 1953 Volume 48 


MEASURING THE ACCURACY AND STRUCTURE 
OF BUSINESSMEN’S EXPECTATIONS 


ROBERT FERBER 
University of Illinois 


n 1948 an extensive project on the relationship between business- 

men’s expectations and business fluctuations was begun under the 
joint sponsorship of the Merrill Foundation for the Advancement of 
Financial Knowledge and the University of Illinois. This project has 
been conducted under the direction of Franco Modigliani, who made 
important contributions to the present study and to the other sub- 
projects making up the program of research as a whole. As part of this 
program, an analysis was undertaken of the accuracy and structure of 
railroad shippers’ forecasts, probably the only continuous set of data 
on economic expectations in existence extending back quarterly to the 
1920’s and relating to individual industries and regions. A number of 
statistical problems arose in the course of analyzing these data, the 
treatment of which would seem to be of broad interest to economic stat- 
isticians and others working with time series data. It is the purpose of 
this article, therefore, to present these problems and the methods used 
to solve them. Some of the results of the study are also presented in the 
course of this exposition.! 


1. THE DATA 


The firms that account for the great bulk of railway shipments are 
members of the National Association of Shippers Advisory Boards. 
This organization has some 25,000 members. It is affiliated with the 





1 The full results are being published in bulletin form by the Bureau of Economic and Business 
Research of the University of Illinois; The Railroad Shippers’ Forecasts, 2 monograph, a Study in Busi- 
ness Expectations and Planning, by Robert Ferber. 

At this point, the writer would like to acknowledge the valuable assistance provided by several 
staff members of the Merrill Project in this study. Special thanks are due to Jack J. Feldman for his con- 
tributions to the analytical framework employed and to Mary Lou Walling and Jean Rogers for o 
various statistical services they performed. 


385 





386 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


Association of American Railroads, and its primary function is to con- 
sult with, and advise, the AAR on shipping problems and transporta- 
tion conditions. To this end, the National Association is subdivided 
into thirteen regional boards, and within each board, into about 32 
major commodity groups and, where particular individual commodi- 
ties are of local importance, into commodity subgroups. The composi- 
tion of these regions and of the major commodity groups has remained 
remarkably stable since 1927. 

The forecasting procedure begins with the transmittal of requests 
to the shippers about the middle of a quarter for estimates of their 
freight car requirements in the next quarter. The forecasts are com- 
piled about two or three weeks later, usually by the secretary of the 
regional board, who is an employee of the AAR and is not a shipper; 
in one region the forecasts are compiled by the chairmen of the com- 
modity groups. Whoever compiles the data computes the expected per- 
centage increase or decrease in the next quarter’s shipments for each 
commodity group and for the region, as compared with the group’s 
shipments in the corresponding quarter of the previous year. At this 
stage, the commodity group chairman has the right to modify his 
group’s forecast if he so chooses, but actually such changes are appar- 
ently made only when some major development suddenly occurs which 
the shippers may not have anticipated, such as a labor stoppage. 

Though not designed on any probability sampling scheme, these 
data appear to be representative on the whole. Response rate of the 
members for any one commodity group may vary anywhere between 
25 and 80 percent. However, since special emphasis is placed on se- 
curing replies from the larger shippers, the coverage in terms of ship- 
ments is much higher and often almost equivalent to that of a com- 
plete census. The traffic managers, the officials who prepare the fore- 
casts, evidently do so carefully, and no evidence was found of any at- 
tempt to modify the forecasts so as to conceal information from com- 
petitors—that is, commodity-group chairmen—or to pad the forecasts 
so as to ensure having enough cars on hand. 


2. OBJECTIVES 


In the main, the analysis of these data had the following threefold 
objective: 

1. To measure the accuracy of the forecasts. 

2. To test hypotheses concerning the structure of the forecasts, i.e., 
whether the forecasts can be explained by events that happened in the 
past. 





MEASURING BUSINESSMEN’S EXPECTATIONS 387 


3. To determine whether some transformation of the forecasts can 
be used to improve the accuracy of the forecasts as such. 
Each of these points will be elaborated upon in later sections. 


3. THE MEASUREMENT OF ACCURACY 


There are two respects in which the accuracy of any forecasts can be 
evaluated. First we may ask: How close do the forecasts come to what 
actually happened? In other words, what is the margin of error in the 
shippers’ forecasts? Answering these questions, however, provides little 
information on the practical value of the forecasts. To determine the 
latter, we must ask: How does the error of the shippers’ forecasts com- 
pare with the error that would have been obtained by using some alter- 
native, readily available forecasting method? In effect, answering this 
question constitutes a test of the relative accuracy of the forecasts. No 
matter how close the forecasts may be to actual shipments, they cannot 
be of much practical value if some other simple forecasting procedure 
proves even more accurate. And conversely, the shippers’ forecasts may 
be of practical value in the sense of coming closer to actual shipments 
than alternative forecasting procedures, and still be considerably in 
error. 


8.1 Accuracy of Level Forecasts 


A comparison between the levels of actual and forecasted carload- 
ings is shown in Chart 1. The upper panel refers to all carloadings and 
the lower panel to products of manufactures and mines only; agricul- 
tural commodities were excluded from the study. This panel indicates 
that the shippers’ forecasts tend to be too low in the upswings and too 
high in the downswings. In other words, the forecasts seem to lag behind 
actual events. Thus, the forecasts overestimated actual shipments all 
through the 1929-32 and 1937-38 contractions and even in the rela- 
tively mild recessions of late 1946 and 1948. The reverse was generally 
true for upswings. In fact, in a number of instances the forecast for the 
current quarter, denoted by the letter t, appears to be essentially the 
actual value for the corresponding quarter in the preceding year, t—4. 
For example, actual carloadings hit bettom in the second quarter of 
1932, but the forecasts did not reach their low point until the second 
quarter of 1933. 

Although this chart portrays the general trend of the forecasts in 
relation to actual shipments, it does not provide a precise indication of 
the accuracy of the forecasts. Such an indication may be obtained by 
studying the ratio of expected shipments to actual shipments, a ratio 














Cuart 1. Forecasted and Actual Carloadings, 1927-1949. 


denoted here by E,/A:, or, where no lags are involved, by E/A. Table 
1 presents two statistics based on this ratio. One is a measure of over-all 
accuracy, the average of the absolute relative errors of the forecasts, 
i.e., >| (E/A)—1 | /N.Thevalues for this statistic are presented for total 
carloadings and for nonfarm carloadings, excluding 1942-45, broken 
down by prewar and postwar periods and by the immediate trend of 
carloadings. Coal and ore shipments have been excluded from the non- 
farm total in this part of the analysis because of their disproportion- 
ately high weight in the total and because of the extent to which ship- 
ments of these commodities are influenced by such factors as labor dif- 
ficulties and weather conditions. Three trends are distinguished: rising, 
level, and falling. Since the analysis is based on seasonally unadjusted 
data, each type of trend is defined with reference to the ratio A:/Ar, 
i.e., actual shipments in the current quarter divided by shipments in the 
corresponding quarter of the preceding year. A particular quarter is 
said to exhibit a falline trend if A,/A:_4 is less than .95, a level trend if 
A,./Az+is from .95 to 1.1 * ~<lusive, or a rising trend if A,/A:4 exceeds 
1.05. 

A measure of the dispersion of the estimates is presented in the last 
column of the table, which contains the values of the standard devia- 








MEASURING BUSINESSMEN’S EXPECTATIONS 389 


TABLE 1 
ACCURACY OF CARLOADINGS ESTIMATES, 1927-41, 1946-50 








Average absolute percent of error Standard 
; deviation 
Period 


of ratios 
All about 1 





Rising Level Falling 





Nonfarm Carloadings 





1927-41 5.3 18.9 
1946-50 1* 3.5* 10.5t 
1927-41, 1946-50 4.6 17.5 





Total Carloadings 





1927-41, 1946-50 5.5 4.3 15.4 





* Fewer than 10 observations. 

+ Fewer than 5 observations. 
tion of the ratio, H/A, about unity. The value of this statistic increases 
the further the ratios depart from 1, which in this case seems to pro- 
vide a more relevant measure of dispersion than the standard deviation 
about the mean.? 

Over the entire period studied the use of this measure of accuracy 
indicates that the shippers’ forecasts, on the average, deviated about 
10 percent from actual shipments in an individual quarter in the pre- 
war years with the error declining to 6 percent in the postwar years. 

When the observations are grouped according to the trend of car- 
loadings, considerable differences appear between the means of the 
various subgroups. In general, the estimates tend to be most accurate 
when no trend in carloadings is perceptible; this is especially true of 
total carloadings, for which the shippers’ forecasts came, on the 
average, to within 4 percent of actual shipments. 

The estimates tend to be least accurate in downswings. In upswings, 
shipments tend to be underestimated on the average by about 5 per- 
cent, which contrasts with the average overestimate of 17 percent in 
periods of contraction. This striking difference is also borne out by 





* Actually the two are related by the expression: 


E/A —1\2 
o(E/A—1) = o'g/A | 1 + ( ) |: 
CEIA 


3 The greater accuracy of the forecasts in the postwar period is attributable, at least in part, to the 
lesser amplitude of fluctuation of shipments in this period relative to that in the prewar years, which 
would leave smaller margins for error, other things being equal. 





390 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


analysis of the individual ratios. In every one of the 26 instances of de- 
clining carloadings, shipments were overestimated, whereas among the 
31 instances in which carloadings were rising, there were 26 cases of 
underestimates. The average absolute error of the forecast is also seen 
to be very much smaller in rising periods. 

Do these results indicate that a real tendency exists for the shippers 
to err more on declines than on rises or are the greater errors on down- 
swings primarily the result of the greater intensity of declines during 
the period studied?4 On examining the data, we find that facts can be 
marshaled to support both hypotheses. During the period of observa- 
tion, the average decline, in those quarters in which declines occurred, 
was 20.3 percent; the average of the rises was 17.2 percent. In addition, 
the coefficient of determination between the percentage error in the 
forecasts and the percentage change in carloadings is .54 for the de- 
clines and .40 for the rises. Thus, the declines were in general sharper 
than the upswings, and large errors in forecasting do tend to be asso- 
ciated with large changes in carloadings. On the other hand, neither 
the difference between the average annual decline and the average an- 
nual rise nor the extent of association between the errors in the fore- 
casts and the magnitude of change is substantial—clearly not so pro- 
nounced as the difference between the average errors of the forecasts 


on declines and upswings. Nevertheless, some allowance for these dif- 
ferences seems necessary, and a simple means of making such an allow- 
ance lies in computing the regression of the ratio of expected to actual 
shipments on the percentage change in carloadings for declines and up- 
swings separately. The results (for nonfarm carloadings), are as fol- 
lows: 

Declines: 


E/A=102%+.733% decline 
Rises: 
E/A=103% — .385% increase 
The regressions indicate that, on the average, a 10 percent contrac- 
tion in carloadings increases the overestimate in the forecasts by 7.33 
percentage points, whereas a 10 percent increase in carloadings in- 
creases the underestimate by 3.85 percentage points. It seems, there- 





4 Another alternative, suggested by a referee, is that the results are a peculiarity of the statistic 
used. For if E and A are independent of each other, the expected value of 2_|(EZ/A)|/N will exceed 
unity, and much more so when E exceeds A than when A exceeds E, for the same set of alternatives. 
Though this bias in the rtatistic may account in part for the observed phenomenon, it is hardly likely 
to constitute a major cause because, even under the assumption of independence, the range of the ratio, 
E/A, is relatively so narrow—generally between .8 and 1.2—that it could not give rise to such large 
differences in accuracy as noted above. 





MEASURING BUSINESSMEN’S EXPECTATIONS 391 


fore, that the shippers do indeed tend to err more when carloadings de- 
cline than when carloadings rise.® 

An additional bit of evidence favoring the second hypothesis is pro- 
vided by the existence of much the same phenomenon in the postwar 
period, as is shown by Table 1. The overestimates in the prewar period 
might have been explained on the ground that 1929-32, when most of 
the declines occurred, was not a representative period. The general 
expectation at the time was one of permanent prosperity, and in its 
initial phases the decline was considered to be a purely temporary phe- 
nomenon. At another time, it could be argued, such an erroneous expec- 
tation is not likely to be present. However, such an explanation is 
clearly not valid for the postwar years—in particular for 1949 when 
declines were widely expected. Yet the largest overestimates occurred 
at the end of that recession. 

Correlation of actual and anticipated rates of change. Comparison of 
the level of expected and actual shipments, though useful in itself, is 
an insufficient indicator of the accuracy of the forecasts. The forecasts 
and the actual shipments are necessarily related to each other because 
of the serial correlation in the data, i.e., because of the correlation of 
both #, and A, with A;-1. For this reason, Z, and A; may both be of the 
same general magnitude, with fairly high correlation, although the 
direction of change may be missed altogether. A more reliable indicator 
of accuracy is therefore obtained by removing this spurious element 
and comparing the anticipated and actual rates of change, i.e., E;/A:-a 
and A;/A,z4, rather than the actual levels. 

An analysis based on E,;/A;_; has the drawback of introducing the 
problem of seasonal variation. Rather than attempt removal of the 
seasonal component with the more or less doubtful standard tech- 
niques, it was decided to utilize a much simpler technique made pos- 
sible by the availability of data for a substantial number of years, 
namely, to analyze separately each quarter of the year. Unless the sea- 
sonal pattern changes markedly from year to year, this method seems 
most suitable for the present purpose. 

The coefficients of correlation between E,/A;-1 and A,/A:-1 obtained 
for aggregate nonfarm carloadings for the prewar period are as follows: 





5 The question might be raised whether the slope of the downswing regression may not be biased 
upward by the presence of extreme values in those two series. However, there seems to be little likelihood 
of such a bias because, even though the average decline exceeds the average rise, there are almost as 
many extreme rises (3) as there are extreme declines (4). In any event, the elimination of the four ex- 
treme changes from the downswing data, which almost equalizes the average decline and average rise, 
leads to the regression: E/A =103% +.667% decline. The above conclusion would therefore remain 
the same. 





392 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 








Correlation 5% level of significance 


Quarter coefficient (absolute values) 





Ist —.14 53 
2nd — .24 51 
3rd — .43 51 
4th 44 51 





Clearly, the shippers’ forecasts completely failed to anticipate the 
rate of change of shipments. In the first three quarters the correlation 
is negative; when shippers are optimistic and anticipate a (more than 
seasonal) rise, actual shipments are more likely to fall than to rise. Only 
in the fourth quarter is the correlation positive, but it is very small. 
In fact none of these coefficients is significantly different from zero at 
the 5 percent level of significance. On the whole, therefore, the correla- 
tion between actual and anticipated shipments seems to be approxi- 
mately zero. If it differs from zero, it is more likely to be negative than 
positive.® 

This might seem a rather astonishing result. Actually, however, all 
that it shows is that the shippers’ forecasts are not unlike other fore- 
casts: they provide a good idea of the general level of business condi- 
tions—which is not surprising, considering the short period ahead for 
which the forecasts are made—but scant evidence as to the direction 
of change. The latter is the crucial problem in forecasting business con- 
ditions, and the shippers’ forecasts do not seem to supply the answer, 
at least so far as the aggregate of all reporting industries is concerned. 

It might be noted that much the same results were obtained when 
the same techniques were applied to commodity groups and to selected 
commodity groups within regions—reasonably good forecasts of level, 
much greater errors on downswings than on upswings, and near-zero 
correlation between actual and anticipated rates of change. 


3.2 Are the Forecasts Better than Simple Projections? 


Do the shippers’ forecasts provide more accurate estimates of car- 
loadings than might be secured through some simple projections rely- 
ing on the serial correlation in the actual series itself? If this is not the 
case, it would then seem that the information collected from the rail- 
road shippers is indeed of very little forecasting value. 





6 This conclusion was supported when E;/A;_: was correlated with A;/A;_: for all quarters com- 
bined, after the ratios were adjusted for seasonal variation. The seasonal adjustment consisted of divid- 
ing all the ratios for a given quarter by the average value of A;/Az_: for that quarter, a procedure which 
in effect provides estimates of the ratios of the seasonally adjusted data. The resultant correlation was 
-.01. 





MEASURING BUSINESSMEN’S EXPECTATIONS 393 


Two alternative simple forecasting models were used in the test, both 
based on the serial correlation in the actual shipments. The first con- 
sisted in predicting carloadings in the current quarter at the same level 
as carloadings in the corresponding quarter of the preceding year. In 
other words, the forecast of carloadings by this method, say E,*, is 
simply A;_4. Were it not for the seasonal element, A:_; would be prefer- 
able, but in the absence of seasonal adjustment A;_4 is the most plausi- 
ble choice. 

This measure, however, has the disadvantage of making no allow- 
ance for short-run trends in business conditions, a disadvantage that is 
intensified by the use of A;4 instead of A:_;. Some such allowance is 
clearly indicated, which leads to the second forecasting measure based 
on serial correlation. This measure, which we may call E,**, predicts 
carloadings in the current quarter as the level in the corresponding 
quarter of the preceding year adjusted for the change in carloadings 
over the last year; in other words, A;4(As1/A:-5).” 

Using these two measures, forecasts of carloadings were made for 
each quarter from 1927 to 1941, first for total nonfarm commodities 
(excluding coal and ore) and then separately for each of five selected 
commodity groups. The forecasts were segregated by the trend of car- 
loadings, and the accuracy of the forecasts obtained by each of these 
two methods was compared with the accuracy of the shippers’ forecasts. 
The comparison is presented graphically in Chart 2, which shows the 
proportion of time (quarters) that EZ; is more accurate than E,* and 
E;**, in turn, by commodity group and trend of carloadings. The ver- 
tical dashed line overlapping each set of three bars indicates the relative 
accuracy of E, for each commodity group averaged over all three phases 
of the cycle. The shippers’ estimates are more or less accurate, on the 
average, than the simple projections to the extent that the bars and 
dashed lines extend to the right or left of the 50 percent guide line. 

A number of points are evident from this chart. One is that marked 
differences in the relative accuracy of the shippers’ estimates exist in 
the various phases of the cycle. The type of projection used is, of course, 





7 The use of “naive models” as a means of evaluating the practical value of forecasts is nowadays 
associated with the names of Milton Friedman and of such agricultural economists as T. W. Schultz 
and O. H. Brownlee (e.g., Schultz and Brownlee, “Two Trials to Determine Expectation Models 
Applicable to Agriculture,” Quarterly Journal of Economics, Vol. 56 (1942), pp. 487-496; Brownlee, 
O. H., and Gainer, W., “Farmers’ Price Anticipations and the Role of Uncertainty in Farm Planning,” 
Journal of Farm Economics, Vol. 31 (1949), pp. 266-275). However, Merle Crawford has brought to my 
attention references on the use of such methods in the literature of the late 1920's. Exact counterparts 
of the two models employed in this study were used to judge the accuracy of monthly forecasts of 
industrial production in Chicago for 1929-30 (King, R. B., “A Method of Appraising Short-term Fore- 
casts,” Journal of the American Statistical Association, Vol. 25 (1930), pp. 333-334). Similarly, the 
simple extrapolation of levels was suggested as an alternative, and possibly superior, method of fore- 
casting in 1927 (Comer, H. D., and Watkins, R. J., “Forecasting a Line by Itself,” Journal of the Amert- 
can Statistical Association. Vol. 22 (1927). pp. 505-507). 





394 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 





ACCURACY OF SHIPPERS EXPECTATIONS RELATIVE TO SIMPLE PROJECTIONS 
1927 — 1941” 
PROPORTION OF TIMES ACCURACY OF E, PROPORTION OF TIMES ACCURACY OF E, 


EQUALS OR EXCEEDS €; EQUALS OR EXCEEDS €, 


ee ane r r = 





—— 


1RON & STEEL 








AGRIC IMPL | 
o 0 20 BW 40 1D & 70 80 90 100 6 © 2 WM 40 SD & 70 80 WHO 
PERCENT PERCENT 


GED rusinc WEEE cever GH rauinc  ™ Ej-a,, Cy Ay Att 
1-3 

















Cuart 2. 


also highly relevant, and interacts with the phase of cycle in determin- 
ing the relative accuracy of E,;. When £;,* is the yardstick, the shippers’ 
estimates are superior except in level periods. This is only to be ex- 
pected, for when carloadings remain level, E*=A,4 is bound to be 
highly accurate by definition. 

As compared with £,**, the shippers’ estimates tend to be superior 
much more frequently when carloadings are rising or level than when 
they are declining, although the overwhelming superiority of EZ, in the 
level phases of iron and steel and of agricultural implements carloadings 
is somewhat misleading because there are fewer than five observations 
in each case. The shippers’ estimates measure up very poorly against 
E,** when carloadings are falling. Inspection of the individual observa- 
tions indicates that the 1929-32 depression was the period when this 
discrepancy was greatest. When shipments were declining during this 
period, E,** came very close to the actual figures, whereas overestimates 
by the shippers of as much as 50 percent were not uncommon. 

Another measure of the fcrecasting value of the shippers’ estimates 
was obtained by comparing the sizes of the errors committed by the 
shippers and by the mechanical formulas instead of just counting the 











MEASURING BUSINESSMEN’S EXPECTATIONS 395 


number of times one forecast proved superior to the other. If the ship- 
pers’ forecasts are only slightly less accurate than the mechanical for- 
mula very frequently but happen to be appreciably better at certain 
crucial times, such as cyclical turning points, these two means of evalu- 
ation may produce very different results. To test this possibility, the 
average absolute deviation of the shippers’ forecasts from actual ship- 
ments, ie., >.|(A:—E.)/A:|/N was compared with the average ab- 
solute deviation of Z,** from actual shipments, i.e., >. | (A.—Z,**) /A;| / 
N for each of the five industries, and for total nonfarm carloadings. 

The results are presented in Table 2, covering the prewar and post- 
war periods separately. The figures in this table essentially support the 
previous findings with regard to the prewar period. The mechanical 
formula proves more accurate than the shippers’ forecasts for every 
industry, as shown in the last column of the table. If we call the av- 
erage absolute deviation of the shippers’ forecast, A, and that of E,**, 
B, then this column is (B—A)/B. When this value is negative, E,** is 


TABLE 2 


AVERAGE ABSOLUTE DEVIATION FROM ACTUAL SHIPMENTS OF 
SHIPPERS’ FORECASTS, £,, AND OF £,**, BASED ON MECHANI- 
CAL EXTRAPOLATION, SELECTED INDUSTRIES 








A.-E; Relative 
IN > —aae accuracy of 
. shippers’ forecasts 


A,—E,** 


Industry po A 
t 





1928-1941 





Iron and steel 8% 
Lumber 

Flour 

Cement 

Agricultural implements 

Total nonfarm 





1946-1950 





Iron and steel 

Lumber 

Flour 

Cement 

Agricultural implements 
Total nonfarm* 





* Second quarter, 1947-50. Total nonfarm carloadings in 1945 included shipments of war goods 
which could not be separated out of the total. 





396 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


more accurate on the average than the shippers’ forecasts, and when 
it is positive, the reverse is true. These figures show that Z,** is particu- 
larly superior for iron and steel, lumber, and cement—these are also the 
industries for which £,** was more frequently accurate than the ship- 
pers’ forecasts in Chart 2. 

In the postwar period, however, the shippers’ forecasts appear in 
more favorable light. They are more accurate than the mechanical for- 
mula for three of the five industries tested, lumber and agricultural im- 
plements being the two exceptions. For 1947-50, both the agricultural 
implements and the lumber shippers’ forecasts are also more accurate 
than those of the mechanical formula. The margin of accuracy in favor 
of the shippers’ estimates is as high as 26 percent for iron and steel and 
20 percent for flour. 

The results may indicate some degree of permanent improvement in 
the relative accuracy of the shippers’ forecasts or they may be due to 
the special circumstances prevailing in the postwar years. That the 
second factor accounts for at least part of the improved accuracy of 
the shippers’ forecasts is probable because (a) greater errors were found 
in the shippers’ forecasts on downswings than on upswings and there 
was a positive relation between the amount of error and the magnitude 
of change in shipments, and (b) the frequency in the postwar years of 
strikes and other special factors, which the shippers could foresee and 
which places a mechanical formula at a distinct disadvantage. In more 
than one quarter there is clear evidence that the shippers anticipated 
a strike and modified their estimates accordingly. Because of the nature 
of the mechanical formula, the effect of such action is to favor the ship- 
pers’ forecasts not only in the quarter in which the strike occurs but in 
three others as well. 

All in all, therefore, these findings seem to indicate that the fore- 
casting value of the shippers’ estimates relative to possible mechanical 
devices may be substantial only in periods in which special factors, such 
as labor difficulties and limitation of output by capacity, prevail. This 
does not mean, however, that these data might not be useful for im- 
proving the forecasts at other times when taken in combination with 
other factors. 


4. STRUCTURE OF THE FORECASTS 


The question to which we address ourselves in this section is: What 
are the major factors that explain the expected level of shipments, i.e., 
what is the structure of the forecasts? Accuracy of the forecast is of no 
concern to us here, for irrespective of their accuracy we want to deter- 
mine how well we can “forecast the forecasts”. 





MEASURING BUSINESSMEN’S EXPECTATIONS 397 


4.1 The Basic Hypothesis: Extrapolation of Recent Experience 


The hypothesis encountered most frequently in discussions of the 
formation of expectations and in statistical models of our economic 
system is that expectations represent extrapolations of recent experi- 
ence. The assumed extrapolation might be one of level or an extrapola- 
tion of recent rate of change. 

The “extrapolation of level” hypothesis can be stated symbolically 
as follows: 


(1.1) E, = Atrat uU 


where u, denotes an “error” term which might be random or might it- 
self represent the influence of other variables, but which, by hypothesis, 
should be small. 

The “extrapolation of trend” hypothesis might be stated in the form 


(1.2) E, = Ai-1 + a(Ay3 > At-2) a u,*. 


If a is positive (but smaller than unity) as seems to be usually assumed, 
an extrapolation of recent trend is indicated. However, a could also be 
negative. Though little attention seems to have been given to this pos- 
sibility, it would be better to refer to this case as a reversal of trend 
rather than as an extrapolation of trend; for a negative a would imply 
that activity is expected to contract below the present level whenever 
an expansion has occurred in the previous period, and vice versa. 


4.2 The Problem of Seasonal Variation 


Any attempt to test hypotheses of the form (1.1) or (1.2) when quar- 
terly data are involved raises the problem of seasonal variation. This 
is particularly true in the present study because sizable seasonal fluc- 
tuations exist in shipments for individual commodities as well as for the 
over-all aggregates. 

In the presence of such pronounced seasonal variations, hypotheses 
such as (1.1) and (1.2) cannot reasonably be tested directly from the 
raw data unless we are willing to make the absurd assumption that 
shippers are completely unaware of seasonal variation in their ship- 
ments—a hypothesis convincingly disproved by the results. If the ship- 
pers are aware of seasonal variation, then it must be assumed that some 
adjustment is made for the effect of recurring seasonal changes in pro- 
jecting recent experience. Obviously, then, some kind of adjustment of 
the data for seasonal variation is required before any reasonable type 
of extrapolation hypothesis can be tested. 

One approach to this problem would be to estimate coefficients of 





398 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


seasonal variation from the data, use these coefficients to remove the 
seasonal component, and then proceed with the tests. This procedure is 
nevertheless not entirely satisfactory for the purpose. For one thing, it 
is well known that standard methods of eliminating seasonal variation 
from data are unsatisfactory in many respects, particularly in the arbi- 
trary means involved in obtaining the seasonal coefficients. A second 
objection derives from the basic purpose of this analysis, which is to 
“explain” how shippers form their anticipations. It is clear therefore 
that the question to be asked is not: “What is the best method of ad- 
justing data for seasonal variation?” but rather “What do we know 
about the way in which shippers actually make adjustment for seasonal 
variation?” 

Information on the estimating methods used by the individual ship- 
pers was obtained partly through a mail survey of commodity chair- 
men of the regional boards and partly through interviews with ship- 
pers and officers of some of those boards. Although the scope of the 
survey was very small, complete unanimity of opinion was expressed 
regarding a wide range of commodity groups and regions. In almost all 
instances where a commodity was subject to seasonal variations, the 
shipper indicated that he obtained his forecast by adjusting his actual 
shipments in the corresponding quarter of the previous year for changes 
taking place during the intervening year and for any unusual conditions 
that prevailed in that quarter or that were likely to prevail in the quar- 
ter for which the estimate was made. In other words, the shippers them- 
selves rely upon an implicit method of seasonal adjustment based on 
their use of the corresponding quarter of the previous year as the start- 
ing point for their forecasts. 

Another factor favoring the use of implicit seasonal adjustment is the 
manner in which the AAR requests the forecasts, namely, as a percent- 
age of the shippers’ carloadings in the corresponding quarter of the 
previous year. Because of this fact, the individual shipper is inclined 
to prepare his forecast with reference to that earlier quarter. In no 
instance was there any evidence of a shipper’s using his level of car- 
loadings in the immediately preceding quarter rather than A; as 4 
base for his forecast. 


4.8 The Extrapolation Hypothesis 


If the forecaster starts from actual shipments in the corresponding 
quarter of the year before, i.e., A:4, and extrapolates the latest level 
of activity, which is represented by A;-1, he will have to adjust for the 
growth or decline which has already occurred between A; and A:-. 





ER 1953 


ve the 
lure ig 
ing, it 
lation 
 arbi- 
cond 

is to 
efore 
f ad- 
know 
sonal 


ship- 
hair- 
ship- 
f the 
ssed 
st all 
_ the 
‘tual 
nges 
ions 
uar- 
em- 
1 on 
art- 


the 
ant- 
the 
ned 

no 
ar 
Sa 


MEASURING BUSINESSMEN’S EXPECTATIONS 399 


The intervening change, however, cannot be simply measured by A;-1 
—A,_4 (or some simple variant thereof), because this difference is itself 
affected by seasonal variation. Lacking seasonally adjusted data, a sim- 
ple approximation to intervening change, not affected by seasonal vari- 
ation, will be represented by the change occurring during the entire 
past year which is, in absolute terms, A;1—A1-s; or, in proportionate 
terms, (A t—1 -A -s)/A t—5- 

If this adjustment for the intervening change is applied to A:«, we 
obtain a formula of the type 





Ait —A =) Aus 


in f° ie 


This is precisely the “mechanical formula” used to compute E;* in 
the accuracy test in Section 3.2.8 

If it is true (1) that the shippers rely on an indirect method of sea- 
sonal adjustment which consists in adjusting A;4 for the change oc- 
curring during the past year, and (2) that expectations represent pri- 
marily an extrapolation of level, then we should expect to find that the 
expression on the right-hand side of (1.3) largely accounts for the ob- 
served fluctuations of E;. 

However, it is clearly desirable to set up a statistical method to test 
separately the tenability of the two hypotheses just stated. To achieve 
this result we might begin by testing a more general form of hypothesis 
(1.3), namely 





Awi- 9) 
Ais 


(1.4) E, = 4 + bAy«4 “+ cAva( 


Then, (1.4) coincides with (1.3) when 
(1.5) a=0, b=1, c=1. 


If both hypotheses were correct, the regression coefficients obtained 
by fitting (1.4) to the data should approximately satisfy the three con- 
ditions (1.5). If, on the other hand, the first two conditions of (1.5) were 
satisfied but not the last, it would indicate that the shippers do rely on 
an adjustment of A; in preparing their forecasts and that expecta- 
tions do not represent an extrapolation of level. 

It might be argued, however, that (1.3) represents in reality some- 
thing more than a mere extrapolation of level. For the adjustment of 





8 Notice that the right-hand side of this expression can be interpreted as the product of the latest 
level of shipments, A;y_1 and the factor A;«/A¢-s, which represents a crude adjustment for seasonal 
variation from quarter ¢_: to quarter ¢. 





400 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


the level A;_; for seasonal variation by means of the ratio A :4/A+_s also 
tends to raise this level to the extent that A;4/A:—s incorporates an ele- 
ment of trend. Assuming a rising linear trend, the element of trend in 
this ratio would be one-fourth of the total change due to trend in any 
single year, or one-fourth of the change due to trend from A;4 to A,. 
Therefore, to eliminate this trend element in projecting the level A :_; to 
A, which is the same as adjusting A;_, to the level of A;_: in (1.4), the 
coefficient c in (1.4) could be as low as .75 under this assumption and 
still not contradict the extrapolation of level hypothesis. 

The result obtained from fitting (1.4) to shipments of all commodi- 
ties other than farm products for 1927-41 is: 





(1.6) E, = .09 + .986A:4 — 43Aa( 


Awi— =!) 
Ais 
R? = .972. 


It is clear that the hypothesis fits the data very well. At the same 
time, the coefficient b of A; is close to unity and the constant term a 
is close to zero. (E;, fluctuates between 1.4 and 3.8, with an average of 
2.3.) However, the coefficient c, instead of being close to .75, turns out 
to be only .43. 

These results suggest the following conclusions: 

1. Hypothesis (1.4) appears to describe remarkably well the forma- 

tion of shippers’ anticipations, at least in the aggregate. 

2. Anticipations, far from representing extrapolations of recent 

trend, appear to represent a sharp reversal of trend. 

Since the second conclusion is rather startling, let us examine in 
greater detail the basis for this conclusion, as well as its implications. 

The implication of the statistical results represented by (1.6) may 
be grasped more easily by changing b from .986 to 1.0, a from .09 to 0, 
and c to .44.° With these modifications, (1.6) may be rewritten as 
follows: 

(1.7) ae ae ee 

Ais t-5 
As noted earlier, the first term of this equation can be considered to 
represent primarily an extrapolation of the latest level A,_; crudely ad- 
justed for seasonal variation by means of the ratio A;4/A:-s. But the 





® The reason for raising c to .44 is that the most significant feature of (1.6) for the present purposes 
is the difference between the coefficients b and c. This difference in (1.6) is .556; in order to keep the 
difference constant when b is raised to unity, we also raise c to .44. 





MEASURING BUSINESSMEN’S EXPECTATIONS 401 


second term shows that the respondents tend to modify this extrapola- 
tion of level by subtracting from it more than half the change that has 
occurred during the past year, again crudely adjusted for seasonal 
variation by Ay4/A:-s. Thus, if Ay: exceeds Az_5, the projection of level 
represented by the first term is adjusted downward, that is to say, 
against the recent trend. The opposite is true when shipments have 
been falling. Note, however, that the relative position of EZ, and Az 
depends also on the value of A;4/Az-s and in particular on the ratio 
of the seasonals for these two quarters. The inversion of trend will tend 
to be present only in terms of seasonally adjusted data. In terms of the 
raw data an outright inversion of trend will manifest itself only when 
the change from A;4 to A¢_; is large relative to A:s/At-_s. 

Chart 3 illustrates this possibility. Panel 1 portrays a hypothetical 
course of shipments between quarters t—5 and t—1. Shipments are as- 
sumed to be at a level of 100 in both quarters ‘—5 and t—4 and to rise 20 
percent by quarter !—1. No specific assumptions are made about the 
level of shipments in quarters t—2 and t—3 since it is assumed, in (1.6), 
that the course of shipments in these quarters has no bearing on the fore- 
cast made at point t—1. Between t—4 and t—1 the shipments could, in 
principle, take any course whatever. The dot corresponding to quarter 
tin this panel represents the forecast for that quarter made in quarter 
t—1 according to (1.7). This forecast represents an increase of 44 per- 
cent of 20, or 8.8, above quarter t—4 and, therefore, a decline of 56 per- 
cent of 20, or 11.2, below quarter t—1. 

Panels 2, 3 and 4 of Chart 3 illustrate the regressive character of the 
forecasts by showing the actual behavior of shipments and the actual 
forecasts made by shippers in three selected quarters, namely, the sec- 
ond quarter of 1933 and the third and fourth quarters of 1936; also, 
the anticipated shipments as computed from (1.7), shown by an arrow. 

Panel 2 is particularly interesting since it coincides with the lower 
turning point of the 1929-37 cycle; the shippers’ anticipations appear 
to have caught this turning point, although they underestimated the 
size of the increase.!° The fact that the shippers’ anticipations were so 
close to the value obtained from (1.6) raises the suspicion that this 
success in forecasting the turning point was hardly more than chance. 
The regressive character of the shippers’ anticipations led them to ex- 
pect an expansion throughout the period of contraction from 1929 to 
1932. After three years of failure, this anticipation was finally justified; 





19 Some of the anticipated rise could be accounted for by seasonal influences, since the second 
quarter is seasonally higher than the first. However, the change anticipated by shippers was undoubtedly 
more than seasonal. 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
PANEL 2 


402 


PANEL | 
CARLOADINGS 

CARLOADINGS IN MILLIONS 
120 “Aen 170 
160 


150 


140 














T-5 T4 T-3 T-2 T-i Qa Qz2 Q3 Qe Q@ Qe 
1932 1933 


PANEL 3 PANEL 4 
CARLOADINGS CARLOADINGS 
IN MILLIONS IN MILLIONS 
. 24o0r 


2407 a 


230} seo 


’ 
/ 
H * 
220F ~, 
‘ 
‘ A 
‘ 
‘ 
U 
‘ 


446 


230F 


i 


1 i i i  —— 
Qy3 Q& QQ, Qe As Q2 Q@y3 Ag A QQ Qy 
1935 1936 1935 1936 


Cuarrt 3. Illustrations of Regression Phenomenon. 

















but note (in Chart 1) that the shippers anticipated an impending con- 
traction in every one of the following three quarters (third and fourth 
quarters of 1933 and first quarter of 1934), a forecast which led to con- 
spicuous underestimates in two of these quarters. 


4.4 Reliability of the Extrapolation Hypothesis 
The very high determination coefficient obtained would normally 
lead to confidence in the reliability in the results. In this case, however, 
the size of the determination coefficient alone is not a very reliable test 
because of the serial correlation in the data. As noted earlier, E, and 
A, are necessarily highly correlated (r?=.85) as a result of the correla- 
tion of each of these variables with A;:. For these reasons alone, (1.4) 
is bound to produce high correlation even if the true factors affecting 
expectations differed from those described by our hypothesis. There- 
fore, alternative means must be used to test the reliability of the re- 
sults. Two such means were used in this study, as described below. 





MEASURING BUSINESSMEN’S EXPECTATIONS 403 
4.4.1 Fitting Equation (1.4) to Ay 


The first test involves recomputing (1.4) with A, as the dependent 
variable instead of E,. This test aids in evaluating the reliability of our 
hypothesis in two ways: 

1. If the correlation of EZ, with the two independent variables was 
accounted for exclusively or primarily by its correlation with A,, then 
we should expect the correlation of E, with these variables to be lower, 
or at any rate not higher, than the correlation of A; with these vari- 
ables. 

2. If the regression coefficients of (1.4) fitted to A; turn out to be 
substantially similar to those of (1.6), there would be reason to doubt 
the reliability of at least some of the earlier conclusions. However, since 
the correlation of A; with these variables should be due to serial cor- 
relation, the coefficients of the two variables should be relatively close 
to each other. 

The outcome of the test is 


Ayia — Ars 


t—5 





(1.8) A, = .25 + .886A:4 + .823Ai4 


R? = .90. 


As expected, the multiple correlation for (1.8) is high; yet it is much 
smaller than that for (1.6). Tests of significance reveal the difference 
to be highly significant. 

Even more impressive is the fact that the two regression coefficients 
of (1.8) are close to each other, especially when compared with the 
sizable difference in the coefficients of (1.6). This test therefore con- 
firms the absence of any significant tendency on the part of actual 
shipments to regress toward the past; regression is a property of an- 
ticipations and has no counterpart in the actual course of events. 


4.4.2 Explanation of the Rate of Change 


As in the evaluation of the accuracy of the forecasts, we can seek to 
reduce the disturbing influence of serial correlation by working with 
the anticipated rate of change, E;/A:-1, instead of with E;, itself. In 
other words, we may ask: What are the factors determining the direc- 
tion and amount of change anticipated by the shippers as measured by 
E./Aryi? The fact that E,/A:1 is apparently uncorrelated with 
A,/Aj_1 (Table 3, line 4) represents an additional advantage in favor of 
working with this variable, for any significant relationship established 
between E,/A;_; and other variables cannot then be attributed to the 
correlation of these variables with A;/A¢-1. 





404 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


This approach, however, introduces once more the problem of sea- 
sonal variation, and the problem is dealt with, as before, by analyzing 
each quarter separately. If the seasonal pattern was substantially un- 
changed over the years studied, it can be shown that this method is a 
very satisfactory one for estimating parameters when the variables are 
subject to seasonal variation. In the present case, this condition cannot 
be asserted to be fully met, although indications are that no substantial 
changes in the seasonal pattern occurred.” 

The variable E,/A;_; can be introduced into (1.4) by dividing both 
sides of the equation by A:1. Doing so, and taking into account the 
fact that the constant term in this equation was found to be close to 
zero, we have: 

E, Aws Ais 


1.9 = (b-— + + Uy. 
(1.9) ..*""* 2 **2 


It should be noted that once the seasonal variation has been elimi- 
nated by limiting the analysis to one quarter at a time, A;4/A:_5 may 
be expected to show minor fluctuations since it represents essentially 
the change between the same two consecutive quarters in each year. 
It is therefore unlikely to contribute much to the explanation of 
E./ At because the seasonal for any given quarter is constant if the 
seasonal pattern does not change over time. 


The main results of this test are summarized in Table 3. Columns 
2 to 5 of this table show the results obtained for each quarter separately 
and column 6 contains the same data for all 55 quarters combined, 
after adjustment for seasonal variation by the procedure described 
earlier.” 

This table strongly supports the earlier conclusions. From the first 
row of the table we observe that the correlation of E;/A;_; with 
A,-4/A:-1 is, in all cases, positive and very high, generally about .9. 
This means that shippers tend to anticipate expansion when shipments 
have been falling and to anticipate contraction when shipments have 
been rising. The extent and regularity of this peculiar phenomenon is 
brought out in the scatter diagram of Chart 4. In this chart, which is 
based on all 55 quarters after seasonal adjustment, the anticipated rate 
of change, E,/A:-1, is plotted against the rate of change over the past 
three quarters, A;1/A;-4, which is the reciprocal of A;4/A;:-1. In this 





In addition to this method, seasonal variation was eliminated from the data in some cases by 
rough estimates of the seasonal factors, as will be discussed later. The two methods yielded similar 
results. 

12 The seasonal factors used in the correction are as follows: first quarter, .964; second quarter, 
1.135; third quarter. .968; fourth quarter, .984. 





MEASURING BUSINESSMEN’S EXPECTATIONS 


TABLE 3 


MULTIPLE CORRELATIONS ON EXPECTED AND ACTUAL 
RATES OF CHANGE OF SHIPPERS’ FORECASTS 








(1) (2) (3) (4) (5) (6) 
Row* Q1 Q: Q: cay) Total 





Ti2 .93 .93 .91 .82 .90 
T13 .20 .04 yf .02 .O1 
Te — .36 4 | 41 .04 .20 
rT —.14 .24 .43 .44 .01 
T12.3 .94 .94 .90 .90 91 
142.3 — .42 .23 .43 .24 .16 
113.2 .41 .35 .09 .64 .42 
143.2 .35 -.13 .17 .46 .21 


PONS rorh > 





* Xi =E;/Aes, X1=Aps/Aer, Xe=Ape/Ats, Xe =Ap/At-r. 


way the chart enables the anticipated rate of growth to be compared 
with the past rate of growth. As is illustrated by the scatter chart, when 
shipments have been rising, shippers anticipate contraction, and vice 
versa. Shippers’ anticipations tend to be against the recent trend, and 
the more so the stronger the trend. 

This trend-reversing character of anticipations finds no justification 
or counterpart in the actual rate of shipments, as can be seen from the 
third row of Table 3. The correlation between A;/A;-1 and A;y4/A:ny, 
far from being large and positive, is slightly negative in three of the 
quarters and practically zero in the remaining one. For all quarters 
seasonally adjusted it is negative, though not significantly different 
from zero. It therefore follows that the high positive correlation be- 
tween E,/A;y-; and A;4/Az1 can be explained only by the unusual 
manner in which shippers’ anticipations are formed. This is due to the 
fact that in forecasting ihe next quarter shippers discount the change 
that has already occurred and anticipate that shipments will regress 
from A;_; toward A4_4. 

The last four rows of Table 3 present the results obtained when 
A,4/At_s, or X3, is introduced into the regression relationship. As ex- 
pected, this variable does not contribute much to the correlation, 
though significantly related to E,;/A;:_1 when Xz is held constant. 


4.5 Extension of Basic Hypothesis 


Either or both of the two factors involved in (1.4) may in reality be 
modified by other considerations. Thus, in regressing toward the past, 
the shipper may regress toward A; 4 alone, toward A;4 modified by 





406 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Ey 


Ares 











1 i i . 

1000 100 1200 1300 1400 
Ar. 
At.4 





Cuart 4. Scatter Diagram of E,/A:1 with Ari/Aru. 


A,4/At_s to allow for year-to-year trends, or toward the average value 
of past shipments in the given quarter as measured, say, by 


(Ars + Ain Fe At_an) / 
n-—1 





Similarly, in adjusting the past level of shipments for intervening 
growth, the shipper may use the change in shipments from quarter 
t—5 to t—1 alone, or with allowance for the rate of change in shipments 
by including some such factor as (A;¢1/A¢s)/(At2/At_e), or, in addi- 
tion, the change in shipments from quarter t—9 to quarter t—5. One 
could also consider changes in the rate of change of shipments and sim- 
ilar trends, but the later empirical results do not provide much indica- 
tion that such additional factors would be significant. 

By combining the level-factor hypotheses with the adjustment-fac- 
tor hypotheses, a number of functions were constructed for empirical 
study. Of those, the ones that worked out best (in terms of goodness of 
fit, significance of coefficients, effect of substituting A; as dependent, 
serial correlation of the residuals, and accuracy of postwar estimates) 








MEASURING BUSINESSMEN ’3 EXPECTATIONS 407 


are shown in Table 4. All of these functions fit the data very well. The 
use of ratio variables yielded somewhat closer fits than the use of dif- 
ferences, but there was little to be said regarding the merits of loga- 
rithmic versus arithmetic forms. 

The results suggest that the rate of change in shipments, (A+-1/A¢-s) 
/(At-2/At-s) plays some role in the forecasts. The addition of this vari- 
able is not only statistically significant, but also reduces the serial 


TABLE 4 


ESTIMATES OF PARAMETERS OF SELECTED FUNCTIONS 
EXPLAINING SHIPPERS’ FORECAST 








(5) (6) 
(2) (3) Rt (4) Average absolute 


Function* E,de- Arde —_ percent error 
pendent pendent oheare. 1947-50 





E,= .087+-.556°* A¢_st. = Ag_a [(Agr/At—s)—1] -972 -900 3.0% 2.4% 
(.014) (.022) (.036) 


Ey= .073-+-.467°* Ag_ot-.411°* Age _[(Ag_s/Ag_s)—1] -973 -899 3.0 2.6 
(.014) (. 148) 033 


a (Ag_s/At_s) | 
+.114 -1 
(. O70)" (At_e/At_s) 
log E;=log . A 972°* log Ay_«+-.451°* log [Ag_1/Az_s} - 9796 -893 
(.002) (.021) (.030) 
—.080* log [Ag_/Ag_s] 
(.031) 


log E;=log . 0+. 972** log Ag_s+. 424°* log [At_s/ Ata] -9803 =. 894 
(.002) (.025) (.026) 


A 
$000 tg | 2 =| —.047 log [Ay_4/Ar_} 
(063) 


Agis/Atst (.038) 





* Figures in theses are standard errors of coefficients. One asterisk indicates significance at the .05 probability 
level; two asterisks, at the .01 significance level. 


correlation in the residuals of the other functions to a level where it is 
no longer statistically significant. The positive sign of this coefficient 
suggests that the regression of expectations toward the past is some- 
what modified by the recent rate of change of shipments. At the same 
time, the coefficient of this variable indicates that the extent of this 
modification amounts only to about 10 percent of the recent rate of 
change so that regression toward the corresponding quarter of the 
previous year still remains the dominant pattern of the forecast. 

The hypothesis that the shipper adjusts A; for recent trends by 
means of the ratio A;4/A;:-s proved to be the best of the level hypoth- 
eses. Particular interest attaches to the fact that the coefficient of this 
variable is negative, indicating that the forecasts tend to be lower when 
A,4 is large relative to A:s, and that the level toward which the 
expectations regress is some average of A; and A;_s, instead of merely 
Ava 





408 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


These tests would therefore seem to lead to the conclusion that the 
hypothesis best explaining the structure of the over-all forecasts is of 
the nature: 





E, = (Av 


Aws Awn/Ats A =) 
Ars Acs/Ace Ats/ 


Though this function fits the data very closely, it is disappointing that 
the residuals show some evidence of positive serial correlation, at least 
at the .05 level of significance. This is rather surprising in view of the 
fact that no significant serial correlation appears in the residuals of the 
same function excluding A;4/A:-s, i.e., hypothesis J; and it may indi- 
cate that at least one other relevant systematic factor is being omitted. 
However, tests with various other possible variables proved unsuccess- 
ful in uncovering the missing factor. 

The last two variables in this function also do not appear to be 
statistically significant. Nevertheless, the fact that each variable taken 
separately is significant (hypotheses F and J), and that a combination 
of the two variables significantly improves the goodness of fit of the 
function, seems to bear out the relevance of these variables. 

The last three columns present additional evidence on the reliability 
of the results. Column 4 shqws that the goodness of fit of the functions 
with A, dependent is also very good, but nevertheless nowhere near as 
high as when E, is the dependent variable. The last two columns present 
what is perhaps the acid test of a satisfactory forecasting function, its 
accuracy outside the period of observation. The history of economic 
statistics is replete with hypotheses which were highly successful with 
reference to the period under study but which proved ineffective when 
used for forecasting. 

In the present case, the regression functions are, if anything, appar- 
ently more accurate for predictions than they were during the period of 
observation. The probable explanation for this phenomenon, however, 
is simply the greater amplitude of the fluctuations of the shippers’ 
estimates during the period of observation. To test the plausibility of 
this explanation a “coefficient of determination of prediction” for hy- 
pothesis L was computed as 1 minus the ratio of the variance of the 
postwar residuals to the variance of the postwar shippers’ estimates. 
The result, .79, is a good deal below the corresponding determination 
coefficient of .98 for the period oi observation. This is about what 
would be expected considering the nature of the postwar period, the 
frequency of strikes, and the relative accuracy of the predictions." 





18 In practice, allowance was made for unusual events in a particular quarter by omitting the data 





g that 
t least 
of the 
of the 
r indi- 
‘itted, 


CCeSs- 


to be 
taken 
ation 
f the 


bility 
‘tions 
ar as 
esent 
n, its 
omic 
with 
when 


)par- 
od of 
ever, 
pers’ 
ty of 
* hy- 
' the 
ates, 
tion 
vhat 

the 


MEASURING BUSINESSMEN’S EXPECTATIONS 409 


5. IMPROVEMENT OF THE FORECASTS 


Of the numerous ways in which attempts might be made to improve 
the forecasting value of the shippers’ forecasts, attention was focused 
on the light the residuals of the hypotheses might cast on the course of 
shipments. Two distinct questions may be raised in this connection, 
namely: 

1) Do the residuals of the functions estimating anticipations, 

E,—E,, seem to be associated with the deviations of actual ship- 
ments from the function estimate, i.e., A:—Z,‘? If the deviation 
of expectations from the explained component of £; is in the right 
direction, it would indicate that these deviations tend to improve 
the forecasts, and also suggests the possible omission of other 
factors influencing expectations having a more direct bearing on 
actual shipments. 
If the answer to the first question is in the affirmative, can we 
make use of this information to derive a more accurate prediction 
of actual shipments than is obtainable through the use of the ship- 
pers’ anticipations alone? Of the many forms such an attempt 
might take, limitation of available resources necessitates restrict- 
ing ourselves to inserting U; as an additional variable in a regres- 
sion relating A, and E,, i.e., taking A, as a function of U; and of 
E,. What this does in effect is to increase the importance of U; 
relative to E,; on the basis of the extent to which each of these 
variables is associated with A,; and from a forecasting point of 
view this is exactly what we want, if U; is in the direction of ac- 
tual shipments. 


5.1 Comparison of Residuals 


One means of answering the first of the questions raised in connection 
with the use of the residuals is to compute the partial correlation of 
A, on E; holding E,¢ constant. These partial correlations are shown in 
Table 5 for all manufactured commodities and for each of five selected 
industries. The values of Z,¢ for total manufactured commodities are 
based on hypothesis L, and for individual industries are based on the 
most appropriate hypothesis in each case. 





for that quarter from the analysis, if some adjustment could not be made. In most instances, however, 
adjustments were possible. For example, the most frequent of the unusual events is a labor stoppage, 
and from information on the duration of the stoppage and its effect on industry employment or produc- 
tion, the amount of production or shipments that would have occurred in the absence of the stoppage 
can be estimated fairly well. Unfortunately, it is very difficult to make a similar adjustment in the 
shipper’s forecasts, as there is generally no easy way of measuring the extent to which the event may 
have been anticipated. 





410 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


Except for flour, some tendency apparently does exist for the re- 
siduals of the forecasts to be associated with the deviation of A, from 
E. This tendency does not seem to be very great, though the correla- 
tions are significant at the .05 probability level (r.o5=.336) and some 
also at the .01 significance level (r.o1.=.410). The fact that these five 
correlations are significant, however, would indicate once again that 
relevant variables are omitted from our hypotheses despite the very 
close fits obtained for them. 

Examination of these relationships by phase of the cycle also revealed 
little evidence of association between the residuals, with one outstand- 


TABLE 5 


PARTIAL CORRELATION COEFFICIENTS OF A, ON E£, 
HOLDING Ey CONSTANT 








Partial correlation 


Industry coefficient 





Iron and steel 41 
Lumber -46 
Flour — .25 
Cement .38 
Agricultural implements 45 
All manufactured goods 44 





ing exception, the 1937-38 recession. In the last three quarters of 1938, 
when the use of E,¢ would have led to substantial overestimates, the 
shippers departed markedly from this formula in their forecasts and 
anticipated shipments much better than the formula itself. The same 
phenomenon in reverse occurred in the last quarter of 1938 and the first 
two quarters of 1939. 

For individual industries, the residuals are correlated much more 
closely in the later years of the prewar period, from 1937 to 1941. Lum- 
ber and iron and steel shippers in particular seem to have anticipated 
very well the fluctuations of their industry’s shipments during this 
period; the coefficients of determination between A,—£,° and E,—E; 
from 1935 to 1941 for these two industries are .49 and .81, respectively." 
For the postwar period, however, little relationship was detected be- 
tween the two sets of residuals. 


§.2 Use of Residuals to Improve Accuracy of Forecasts 


Since some relationship was detected between the residuals in the 





4 For the entire prewar period, the corresponding coefficients are .18 and .16, respectively. 





MEASURING BUSINESSMEN’S EXPECTATIONS 411 


prewar period, it would seem worthwhile to investigate the effect of the 
residuals on the forecasts. Multiple regressions of A; on E; and U; were 
computed for each of the five industry functions and for total manu- 
factured commodities for the prewar period. Estimates of the relevant 
correlation parameters are presented in Table 6. 

The parameter in this table of principal interest from the point of 
view of the present analysis is ri3.2, the partial correlation coefficient 
of A, with U, when £, is held constant; this is the same as the partial 
correlation of A, with EZ,‘, holding E; constant. If the residuals do pro- 
vide some indication of the future course of shipments in the prewar 
period, some correlation between the residuals and actual shipments 


TABLE 6 


EFFECT OF RESIDUALS ON ACCURACY OF SHIPPERS’ FORE- 
CASTS, BY SELECTED INDUSTRIES, 1927-41 








Total 
mfd, 
commodi- 
ties 


Iron Agric. 
and Lumber Flour Cement imple- 


Measure* 
steel ments 





rie .92 .84 ; .84 .96 .93 
T13 .20 .24 ? —.1l1 By i | .19 
113.2 .10 .07 ; .17 — .04 12 
Ry.23 .92 .84 : .84 .96 .93 





*Xi=Ayz 
X:=E; 
X.=U:=E,-Ef 
should be evident when the shippers’ forecasts are held constant. In 
addition, our hypothesis as to the nature of the relationship between 
the residuals and actual shipments postulates that any such correlation 
that exists should be positive. 

The estimates of riz. in Table 6 do not provide any evidence that 
the residuals are of value in indicating future trends. The values of 
rg. are in the right direction for all except one of the six cases, but in 
no instance is the estimate of the coefficient statistically significant at 
the .05 probability level. As a result, the addition of the residuals to 
the regression function fails to provide any noticeable improvement in 
the goodness of fit, as is evident from comparing the values of ry and 
the corresponding one of R,.23. Although the deviations of the forecasts 
from the function are in the right direction, the relationship is appar- 
ently not systematic enough to serve as a basis for forecasting. Much 
the same results were found in the case of region. 





412 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


All in all, therefore, it would seem that improvement of the accuracy 
of the forecasts through the addition of the residuals of the shippers’ 
estimating function as an extra variable in the regression of A, on E, 
does hold some promise, but only if used on a selective basis. In other 
words, judicious use of the residuals for some industries and not in 
others might lead to greater accuracy not only for those particular 
industries but on an over-all basis as well. 


6. SUMMARY 


The main findings of this study revolve around the questions of the 
accuracy and the structure of the shippers’ forecasts. On the subject 
of accuracy, we have obtained the following main results: 

1. The forecasts tend to lag behind observed changes. Turning 
points, in particular, are almost invariably overshot by at least one 
quarter. Although the average percentage error of the forecasts is not 
high, the rate of change of shipments is generally missed altogether. 

2. The forecasts tend to err more when carloadings are declining 
than when they are rising, and are most accurate when carloadings 
rcmain approximately level. 

3. The forecasts do not compare favorably in general with other ele- 
mentary forecasting models but are somewhat more accurate in the 
postwar years, 1946-50. The latter may be due either to inherent im- 
provement in the forecasts or to the special circumstances prevailing 
during this period. 

On the structure of the forecasts, the main results may be summar- 
ized as follows: 

1. The preparation of the forecasts appears to involve modification 
of the shippers’ carloadings in the corresponding quarter of the previous 
year for the change in trend in carloadings over the year. These two 
factors alone account for over 97 per cent of the variance in the fore- 
casts. The addition of variables reflecting past rates of change increased 
the explained variance somewhat further. 

2. In applying the above adjustment, the shippers in the aggregate 
tend to allow for only a fraction of the change that has occurred in the 
past year. The result is a regression of shipment anticipations toward 
the past, particularly to A;4 modified by A;-s. This regression phe- 
nomenon, which seems to have no counterpart in actual carloadings, 
explains why the forecasted change is typically counter to the recent 
trend. 

3. There is some evidence that the residuals of the regression equa- 
tions explaining the formation of expectations, though very small, are 





MEASURING BUSINESSMEN’S EXPECTATIONS 413 


associated with the deviations of actual shipments from the equation 
estimates. However, attempts to improve the accuracy of the forecasts 
by correlating actual shipments with the forecasts and with these re- 
siduals met with only limited success. Evidently, the relationship is not 
sufficiently systematic to be of much help. 

These findings are not without limitations, a brief review of which 
serves both to place the findings in a proper perspective and to point 
the way to future work in the field. In the main, four such limitations 
would seem to exist. First, the data may represent the anticipations 
of only one sector of the business community. Although the people 
represented—typically traffic managers—are of some importance in 
their firms, they are probably not on a policy level and may not be fully 
informed of the firm’s future operations. 

Second, it can not be overemphasized that the entire analysis has 
been carried out in terms of aggregates and that we have no direct 
evidence as to the frequency or even the existence of the regression 
phenomenon among individual shippers’ forecasts. Thus, this phe- 
nomenon as observed in this study might conceivably have resulted 
from extrapolation of the level of the corresponding quarter of the pre- 
ceding year by a large group of the respondents and extrapolation of 
trend by another large group. 

Third, the results refer to quarterly data only. This study presents 
no information on the effect of some other time unit on the accuracy 
and structure of anticipations. Fourth, the unavailability of other data 
may exaggerate the extent to which the shippers’ rely on past railroad 
shipments in arriving at their forecasts. Thus, orders data proved rele- 
vant in two of three industry-region functions for which they were avail- 
able. The removal of these limitations through further study of the 
railroad shippers’ forecasts and securing other data on expectations is a 
task for future work in this field. 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 


J. A. C. Brown, H. S. Houruaxker, anv S. J. Prais 
University of Cambridge* 


1. INTRODUCTION 


ECHNICAL advances in electronic engineering which have taken 

place in the last decade have led to an enormous advance in com- 
puting technology with the recent development of a general purpose 
electronic computer. The main features of this are its high speed of 
operation and the ease with which the user is able to adapt the auto- 
matic facilities of the machine to his own particular problem by means 
of a “program”. In other words, the user is able to introduce loop- 
systems of any desired degree of complexity into the machine with great 
facility. 

As is to be expected, serious discussions of the use of the computer 
have so far generally been confined to engineering and mathematical 
circles! with the result that potential users from other fields such as 
economics and statistics have not fully appreciated the advantages that 
the new techniques offer in the practical solution of problems and the 
opening of new lines of research. It is the object of this paper to give a 
non-technical description of one electronic computer known to the au- 
thors and an account of some of its applications in the field of economic 
statistics. 

The ensuing discussion will be in the following order. In the second 
section a brief account is given of the characteristics of an electronic 
computer limited, however, to the extent to which this is required for 
an understanding of its applications. None of the engineering aspects 
will be discussed. The reader who is interested in further details may 
be referred to the excellent Cantor lectures [15] delivered by Dr. M. V. 
Wilkes, Director of the University Mathematical Laboratory in Cam- 
bridge, to the Royal Society of Arts in November 1951, for a clear and 
full description of this and other automatic computers.” 

The alternatives currently provided by punched card methods and 
desk calculating machines are surveyed in the third section and some 
attention is given to the question of relative costs. This leads to a dis- 
sion in the fourth section of the delicate problem of the “economics of 





* The second author is now at the University of Chicago. 

* The popular and (not so popular) philosophical discussions of the “‘electronic brain’’ require no 
consideration here. 

2 For a detailed description of programming on the Edsac reference may be made to the account by 
Wilkes, Wheeler, and Gill [17], and the more elementary account by Hartree [6] Chapter XII 


414 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 415 


programming”. In the fifth section a more detailed account is given of 
some of the statistical problems that have so far been solved by the use 
of the electronic computer, and in the final section the argument is 
summarised. 


2. A DESCRIPTION OF THE ELECTRONIC COMPUTER 


The applications to be described below were made on the Edsac, the 
Electronic Delay Storage Automatic Calculator of the University 
Mathematical Laboratory in Cambridge.’ This machine, which started 
operation in 1949, is suitable for all mathematical manipulations that 
can be put in numerical form; that is, it is a digital machine and not an 
analogue machine (such as a slide rule). It will carry out sequences of 
elementary ‘orders’, such as addition and multiplication, which are 
determined in advance by the user and fed into the machine by means 
of punched paper tape together with the numerical data (if any). These 
orders and numbers are stored by the Edsac in its “store” or “memory”, 
consisting of about 1000 “storage locations”, whence they can be trans- 
ferred to the “arithmetic unit” whenever necessary. The latter unit, in 
which actual operations take place, includes an “accumulator” and a 
“multiplier register” which are analogous to the registers of a desk cal- 
culator. Each storage location can hold one order or one “short” num- 
ber of 17 binary digits, but two short locations can be joined to ac- 
commodate one “long” number of 35 binary digits, equivalent to about 
10 decimal digits. The machine operates entirely in the scale of two, 
although input and output are normally in the decimal system, the 
necessary binary-decimal conversion being performed by orders pre- 
viously fed into the machine. Output from the machine is on a tele- 
printer or paper tape. 

The high speed at which orders are carried out (an addition takes 1.5 
milliseconds, a multiplication 6 milliseconds) makes it desirable that 
the execution of “programs” (sequences of orders properly combined) 
should require as little human intervention as possible. In principle the 
only stimulus needed by the Edsac to take in and carry out a complete 
program is the pressing of the start button; the “control unit” of the 
machine takes over from then on. This explains the word “automatic” 
in its name. In complicated programs, however, it is occasionally con- 
venient to suspend operations briefly so that the user can intervene on 
the basis of intermediate results. 

A program therefore has to specify in complete detail (using the Ed- 
sac order code) the elementary operations that are necessary to solve 





3 For a technical description, see the articles by Wilkes and Renwick [16], and by Wheeler [14]. 





416 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


the problem in hand, taking into account all the possible situations that 
may arise during its execution. A simple example may help to explain 
the nature of a program and the detail in which it has to be specified. 

Suppose the square root of the positive number y is to be found by 
means of the first order iterative process 


te = i — 2? + y 


where 2, converges to Vy from below if 0<y<3(3—V/5)~0.38, as is 
assumed here, and 2» the, first trial value, is zero. The sequence of or- 
ders will then be as follows, supposing that y is stored in location 1 and 
x; in location 2. 


(1) Put the contents of location 2 into the multiplier register. 

(2) Multiply the number in location 2 by the number in the multi- 
plier register and subtract the result, 2,7, from the accumulator (sup- 
posed to be clear previously). 

(3) Add the number in location 1 into the accumulator, obtaining 
y—2z. 

(4) Test whether the number in the accumulator (which for the ad- 
mitted value of y cannot be positive) is non-negative; if it is, the process 
ends (x? being equal to y) ; otherwise, proceed to the next order. 

(5) Add the number in location 2 into the accumulator, obtaining 
Le—eP+y =Li41. 

(6) Transfer the number in the accumulator to location 2, leaving 
the accumulator clear. 

(7) Return to order (1), starting a new iteration. 


If these orders are stored in the locations beginning at 100, they are 
coded as below where the “function letters” have the following mean- 
ings: 


Hn means put the number in location n into the multiplier register. 

Nn means multiply the number in location n by the number in the 
multiplier register and subtract the result from the number in 
the accumulator. 

An means add the number in location n to the number in the ac- 
cumulator 

Tn means transfer the number in the accumulator to location n 
leaving the accumulator cleared. 

En means test whether the number in the accumulator is positive 
or zero. If it is, proceed next to location n; otherwise, proceed 
serially. 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 
SET OF ORDERS FOR COMPUTING A SQUARE ROOT 


Location Order 





100 
101 
102 
103 
104 
105 
106 


Harr zr 
° 
Swwatewn 


— 
S 





The most important order in this program is that used in (4), in 
which the choice of the next operation is made to depend on the state 
of the accumulator at that time. Normally, orders are carried out se- 
rially according to their position in the memory, but by such “condi- 
tional transfers of control” the sequence can be altered whenever nec- 
essary. Without this facility it would not be possible to determine 
automatically whether an iterative or other repetitive process is com- 
pleted, and it is therefore an essential feature of automatic computa- 
tion.4 In this way the whole or parts of a sequence of orders can be car- 
ried out repeatedly, as is the case here. If necessary, orders elsewhere 
in the program can change some orders in a sequence (especially their 
addresses) from one “cycle” to the next one, which may lead to a con- 
siderable saving in the number of orders to be stored. Thus if it is re- 
quired to add together the numbers in locations 200 to 299 we do not 
need a hundred A-orders with different addresses, but only one A-order 
whose address is increased by one in each cycle. 

In summary (a) the use of conditional orders, (b) the short time re- 
quired for each operation, and (c) the large number of storage locations 
for holding intermediate results, enable the machine to perform the 
most extensive and complicated calculations with great rapidity. 


3. THE CHOICE BETWEEN ELECTRONIC AND 
OTHER CALCULATING MACHINES 


In any major statistical problem the effective choice which faces the 
research worker lies between the fully automatic electronic machine 
and the more usual equipment for handling punched cards consisting 
of card sorters, tabulators, and a number of auxiliary machines. The 
choice between these two types of machine on the one hand and the 
desk machine on the other is usually too straightforward to merit de- 





4 There are five other orders in the Edsac code which transfer control in a number of specified 
circumstances. 





418 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


tailed discussion: it suffices to note that the use of desk machines is ad- 
visable in the exploratory stages of a piece of analysis, to determine 
orders of magnitude, to forecast the probable character of the results, 
and to provide checks which may not be included in the program for the 
automatic machines. Occasionally some work may be left for the desk 
machines in the way of scalar transformation of the results, and one or 
two subsidiary calculations which are best omitted from the more auto- 
matic processes; but in principle there is no need for this. 

The factor determining the choice between the two major types of 
machines may best be considered in relation to the processes which are 
carried out in solving a typical statistical problem from the recording 
of the original data to the attainment of the final results. These proc- 
esses usually comprise (1) recording the data, (2) classifying and sort- 
ing, (3) summarising, and (4) the estimation of numerical relationships. 
In general, if the amount of original data is large, the advantages of 
punched card machines are at present greatest in the first three of these 
processes and of automatic electronic machines in the last. 

Punched cards provide a compact and cheap form of record to which 
later reference is easy, particularly if the information on the cards is 
reproduced in numbers and letters along one edge as can be done by an 
automatic interpreter. The sorting of the cards is extremely rapid (up 
to 40,000 card columns per hour), and once the cards are sorted they 
can be quickly summarised on the standard tabulator in which a rela- 
tive slowness of individual arithmetical operations is compensated by 
the ability to carry out a number of operations in parallel, together with 
a parallel type output of the results. On the other hand the range of 
arithmetical operations which can be carried out without recording and 
feeding back intermediate results is severely restricted, so that the use 
of normal punched card machines for the estimation of numerical re- 
lationships from a relatively small amount of summarised data is rarely 
economical. 

With most of the currently operating electronic machines the cost of 
reading and printing a large quantity of information is high and it is 
usually inefficient to use these machines where the ratio of input plus 
output time to computing time is large. Many statistical problems are 
closer in terms of this ratio to those which arise in commerce than to 
those which arise in mathematical or physical problems, and it is of 
interest to quote an estimate which has been made in the former field. 

Bowden [2] has recently considered the application of the electronic 
computer at Manchester University to the production of a weekly pay- 
roll for a factory of 3,500 employees, and has estimated that whereas 
all the numerical computations could be carried out in 48 minutes the 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 419 


printing of the payroll by a standard teleprinter would take some 12 to 
14 hours (though recent developments in parallel output mechanisms 
will reduce this drastically). Further, the storage on magnetic drums of 
all the information which it would be necessary to carry forward each 
week would be prohibitively expensive. 

Few statistical applications have such a high ratio of input and out- 
put to computing time as this, but two illustrations may be given from 
our own experience. The first example concerned the calculation of 
some 2,100 correlation coefficients of zero, first and second order from 
a matrix of sums of squares and cross-products of order 37 X37. On the 
Edsac this was completed in 100 minutes, of which about 80 were ac- 
counted for by input and output. Nevertheless the use of the Edsac 
was economical, since the individual numerical processes were too com- 
plex to lie within the range of punched card machines. The second ex- 
ample was the formation of a matrix of sums of squares and cross-prod- 
ucts of order 8X8 from data which comprised 2,200 observations of 
each variable. This calculation was completed on punched card equip- 
ment with about 8 hours sorting and tabulating, whereas the Edsac 
would have taken about 2 hours to read the information. The computa- 
tion and accumulation of the cross-products are carried out almost si- 
multaneously, but in view of the danger of any momentary failure 
which would have rendered the final result worthless, it would be ad- 
visable to split the operation into a number of parts which would be 
summed later on a desk machine. 

It should be stressed that the points made in the preceding para- 
graphs are necessarily provisional since computing technology is still 
changing rapidly. On the one hand the range of punched card equip- 
ment is being extended as, for example, by the development of an elec- 
tronic multiplier with facilities for transfer between registers. On the 
other hand attention is being given by the designers of the large elec- 
tronic machines to improving input and output facilities, including 
parallel type output, and input and output devices which can operate 
at the same time as computations are in progress in other parts of the 
machine. Better and cheaper forms of semi-permanent storage are also 
to be expected. 

One further point should be kept in mind. The technical knowledge 
required to operate a punched card machine is widely dispersed, and 
most of the methods which are useful in statistical problems are well 
established.’ By contrast the research worker must usually expect to 
invest a good deal of time, sometimes many weeks, in programming a 





5 For an introductory account of punched card methods in the analyses of survey data see Yates 
[18] Sections 5.11 to 5.19. 





420 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


new problem for an electronic computer. Even if electronic methods are 
clearly the best if a program exists, they may be eschewed if such a 
program is likely to be used no more than once. In general if the amount 
of time which can be spared for programming is limited, it will pay to 
invest it in the construction of programs of the most general applicabil- 
ity and to use the more traditional computing devices for problems with 
strong individual characteristics even at the cost of extra computing 
time. 

Since the technical developments are still in full progress it is too 
early to say anything definite on financial costs, but at present the cost 
of a typical full-sized high speed machine with input-output equipment 
adequate for statistical purposes in the United States is probably in 
the region between $400,000 and $1,000,000 equivalent to between 
$100 and $300 per hour of utilization. In Britain the cost per hour of 
utilization is probably between £50 and £100. Thus very roughly, the 
electronic machines are about 100 times as expensive to use as desk 
machines, and about 25 times as expensive as punched card machines. 


4. THE ECONOMICS OF PROGRAMMING 


Once the decision to use the high-speed machines has been made, 
there remain a number of decisions to be made with regard to the par- 
ticular form of program to be adopted. For the most part the criteria 
governing the construction of programs are conflicting, and the pro- 
grammer will have to find the most efficient compromise for his pur- 
pose. In this section we discuss the five criteria which comprise the 
problem we have called the ‘economics of programming’. 

A complete program consists of all the orders necessary for a specific 
calculation such as the finding of the roots of an equation with given 
coefficients. This calculation will involve some operations that also 
arise in other programs, and for which the relevant order sequences can 
be prepared once for all. Of these we may mention as examples such 
common operations as division,’ evaluating trigonometric functions, 
integrating a differential equation and inverting a matrix; input (read- 
ing numbers from the tape) and output (printing the results) also come 
into this category. These order sequences are known as “sub-routines”; 
they are indispensable for the efficient utilization of any automatic 
computer and are made accessible to users through a “library of sub- 
routines”. Many programs consist entirely of a few library routines 
linked together by a short “master routine” in addition to the numerical 
information. The square root program described above is strictly also 





* Unlike some other machines the Edsac has no built-in division. 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 421 


a sub-routine, if only because it contains no input and output facilities. 

a. Speed. There are usually many claimants for computing facilities 
on an electronic machine and users will therefore have to economize in 
the time they occupy. Furthermore, the possibility of breakdown makes 
short runs desirable. Speed is an especially important consideration in 
iterative or repetitive calculations, which frequently occur in this type 
of work. 

b. Size. The capacity of the Edsac’s high-speed-memory is small so 
that ingenuity in programming is often required in order to save storage 
space and hence operating time. For instance in inverting a matrix it is 
necessary to store both the program and the elements of the matrix; 
hence the more space is taken by the program, the smaller is the order 
of the matrices which can be accommodated. This problem is now less 
important with the development of large auxiliary storage facilities on 
magnetic tape which has rapid access time. 

ce. Accuracy. In all digital machines significant figures are lost be- 
cause only a limited number of digits can be used to represent a num- 
ber. A trained human computer counteracts this almost sub-con- 
sciously by shifting the decimal point so as to retain the required num- 
ber of significant figures. The Edsac library contains some “floating 
point routines” which operate in the same fashion and relieve the pro- 
grammer from the difficult task of considering the magnitude of each 
number during the execution of a program. 

In all the more complicated programs it seems that the use of a gen- 
eral purpose floating decimal routine has much to commend it. It is pos- 
sible to arrange this in a form so that ordinary orders are “interpreted” 
and the operations are carried out in the machine with the appropriate 
adjustment of the decimal point. The decrease in the speed of the ma- 
chine is significant, but the construction of the program is made much 
simpler. 

d. Range of application. Programs and especially library sub-routines 
are evidently more useful the greater is the number of specific problems 
for which they can be used. For example, if its other characteristics are 
the same, a routine which inverts all non-singular matrices is preferable 
to one which applies only to symmetric matrices. Similarly, it is desira- 
ble to have a wide range of convergence in iterative calculations. 

e. Ease of construction. Once a program or routine is finished the ef- 
fort spent on preparing and testing it is irrelevant to the user, but be- 
fore then there is frequently a choice between more and less difficult 
approaches, which may be expected to require different amounts of in- 
vestment in construction time and to yield different results in terms of 





422 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


the four criteria mentioned previously. Much of the construction time 
will be spent in testing; as experience has shown that initial errors in 
programming are almost unavoidable and often difficult to locate, the 
Edsac library contains several sub-routines for the detailed analysis of 
trial programs. 

The optimum balance between these five criteria will depend on the 
problem in hand. In the case of library sub-routines, especially the more 
common ones, it is well worth the effort to seek the most efficient meth- 
ods of solution, so that the ease of construction criterion becomes of 
minor importance. Even so it is not always possible to say in advance 
which sub-routine is the most efficient for incorporation in some future 
program. Thus no square root routine which is at the same time small, 
fast, accurate and applicable to all positive numbers has yet been found. 
The routine discussed earlier as an example is small and accurate but 
it does not work for y>.39 (as the solution x, has to approach ~/y from 
below) and converges but slowly for small values of y. It is therefore 
fast only in a limited range of the argument, which may be suitable for 
some programs but not for others. The Edsac library therefore con- 
tains several square root routines among which users may choose. An 
alternative may for instance be fast, accurate and generally applicable 
but occupy more storage space than our example. 

A substitution of wideness of application for size arises in operations 
on symmetric matrices, where much memory space can be saved by 
storing only the elements on and below the main diagonal. In lengthy 
calculations the maintenance of accuracy is often the dominant prob- 
lem and it may be necessary to adopt floating point techniques as 
pointed out above, even though they reduce the speed and considerably 
increase the number of orders. Sometimes the programmer may not be 
much interested in speed or size and prefer the least arduous way of ar- 
riving at a working program, so that the fifth criterion becomes de- 
cisive. 

As a result of technical progress in electronic computation some of 
these criteria may change in importance. The operating speed of some 
of the most recent large-size computers exceeds that of the Edsac by 
a factor of five or ten. These machines frequently also have an auxiliary 
“slow” memory in addition to the electronic memory, the former being 
used to store parts of the program that are not at that time being car- 
ried out. There is, however, also a contrary development towards less 
ambitious electronic computers that can be produced commercially at 
a fraction of the cost of the larger and faster constructions; there speed 
and size will be matters of great concern to the programmer. 





R 1953 


time 
rs in 
, the 
sis of 


1 the 
more 
ieth- 
es of 
ance 
‘ture 
nall, 
und, 
but 
rom 
fore 
» for 
-on- 
An 
ble 


ions 

by 
thy 
‘ob- 
} as 
bly 
, be 
ar- 


de- 


of 


ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 423 


5. APPLICATIONS TO ECONOMIC STATISTICS 


We now turn to consider the applications of the electronic computer 
to econometric problems citing examples that have so far been pro- 
grammed for the Edsac. The main work has been concerned with re- 
gression analysis and least squares procedures but the programs de- 
veloped for these purposes have found other applications as well. 


5.1 The moment matriz 


The first problem to be solved was that of finding a suitable method 
for computing the moment matrix of a number of variables. The prin- 
cipal problem that arises here is one of size, as storing all the numerical 
information at one time may well exceed the capacity of the memory. 
A convenient solution is to arrange operations so that the program or 
the numerical information (or both) need only be taken in as they are 
wanted. The computation of a moment matrix is equivalent to multi- 
plying the matrix of observations by its transpose, and for this it is 
necessary to take into the store only one row of the matrix at a time 
then (a) form the cross-products of the elements,’ (b) add them to the 
previous partial sums of corresponding cross-products and finally, (c) 
proceed to the next row of observations. The size of the store then sets 
no limit to the number of observations that can be taken into account 
but the number of variables may not exceed 25 with the present size of 
the Edsac store. 

The foregoing is a very condensed description of the simplest statis- 
tical sub-routine in the Edsac library; there is also a slightly more 
complicated routine which gives weights to observations as is required 
if they are derived from grouped data. In the case of weighted regres- 
sions the gain in time is particularly impressive because most desk 
calculators are not well suited for the accumulation of triple products. 
It takes about 7 minutes on the Edsac to compute all the 55 weighted 
sums of squares and cross-products of 10 variables with 40 observations 
in addition to about 4 hours for punching and checking the number 
tape and verifying the results by a sum-check. A human computer with 
an electric desk machine would probably need about 75 hours for this 
job, so that 71 hours of labor are replaced by 7 minutes of machine time. 


5.2 The Inversion of the Moment Matrix 


The next step in regression analysis is the inversion of the moment 
matrix and requires a much more complicated routine; such a routine 





7 If there are k variables only k(k+1)/2 cross-products need be formed as the moment matrix is 
symmetric. 





424 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


applies, of course, to all symmetric positive-definite matrices. Among 
the various methods of inverting matrices one had to be chosen which 
(a) could be split into successive stages so as to save storage space and 
(b) avoided divisions as much as possible so as to save time. We se- 
lected the so-called Choleski method* which involves converting the 
original matrix into the product of a triangular matrix and its trans- 
pose, inverting the lower triangular matrix and multiplying the inverse 
triangular by its transpose to give the inverse of the original matrix. 
The three phases can be programmed separately and do not require 
the storing of any intermediate results; at each stage only k(k+1)/2 
numbers (k being the order of the matrix) have to be in the memory. 
The only non-linear operation is the taking of k inverse square roots, 
which is done by a slight modification of the square root routine dis- 
cussed in Section 2 above. By operating in floating point form the ac- 
curacy of the inverse is kept at about. 7 significant decimal figures. The 
routine works at a very satisfactory speed; for instance it takes about 
five minutes to invert an 11 X11 matrix, and as much again to re-invert 
the result as a check. Only one half of this time is spent in actual com- 
putation ; the remainder is used for reading the program and the original 
matrix, and for printing the inverse. At present the Edsac can deal 
with matrices up to the eighteenth order. 

In the case of matrices of orders less than thirteen it is not necessary 


to split up the program for input purposes and reading time can be 
saved by leaving it in the memory in its entirety. This makes it possible 
to invert several matrices in succession without interruptions for pro- 
gram input. In this way twenty 5X5 matrices could be inverted and re- 
inverted in 30 minutes, that is to say, 45 seconds for each inversion. 


5.3 Special Routines for Linear Regression 


By combining these routines most of the computations in regression 
analysis can be performed automatically. For particular problems it is 
on occasion worth taking a further step. Thus for its work on family 
budgets® the Department of Applied Economics in Cambridge has de- 
veloped a number of programs which derive the parameters of different 
types of Engel curves (with their standard errors) directly from the 
basic data. The development of these programs is economical since the 
main burden of the work is to find the regressions of about 150 varia- 
bles—the expenditures on various commodities—on a few “fixed” vari- 
ables—household size and income, and the like. The economical solu- 





8 See Fox and others [5], or more conveniently Dwyer [3] p. 196. 
® See Houthakker [8]. 





ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 425 


tion in this case is to keep the values of the “fixed” variables in the 
store and then read in the other variables one at a time. A further ra- 
tionalization might be achieved if the Edsac could take its numerical 
input from standard punched cards instead of from a specially punched 
paper tape, but this facility has not yet been arranged for the Edsac. 


5.4 Non-Linear Regression 


The value of electronic computation for statistical research is shown 
even more clearly in the case of calculations that would not be at- 
tempted without its aid. Least-squares regression analysis has custom- 
marily been confined to formulae where the parameters (possibly after 
some transformation) enter linearly, so that the normal equations are 
linear and can be solved by classical methods. Although this approach 
isno doubt satisfactory in most investigations it has proved too restric- 
tive for some special problems particularly the estimation of “unit-con- 
sumer scales” in the analysis of family expenditure.!° The equations 
used here are of the type 


(1) y=at+blog > cz; 
t=1 


and 


(2) y = Lica + brn41) 
t=] 

where y, %1, * - * , 2ny1 are Observed variables and a, b, c:, - + +, Cn are 
to be estimated (one of the c; is put equal to unity). With the aid of the 
Edsac this problem has been attacked by an iterative method, that is 
the parameters are adjusted so as to minimize the residual sum of 
squares directly without having recourse to the normal equations 
which result from the usual minimization procedure. The main feature 
of the method is that it is necessary to guess an initial approximation 
to the correct value and then adjust it upwards or downwards by suc- 
cessively decreasing intervals so that it converges to a value which 
minimizes the residual sum of squares. This procedure is practicable 
for the above equations since they can be transformed so that this 
process of adjustment is required for no more than two of the parame- 
ters, uhe remainder being estimated in the usual way. 


5.5 The Simultaneous Equations Approach to Econometric Models 


A potentially fruitful field of application for high-speed computers 
is the simultaneous equations approach to regression analysis" which is 





0 See Houthakker [8] esp. Section 5.4.4. 
1 See Koopmans and others [10] Section 4. 





426 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


the most recent development in the technique of econometrics. Of the 
two methods of estimation used, the “information preserving” or “full 
maximum likelihood” method is in practice too laborious for non-elec- 
tronic equipment, and even the “limited information” or “reduced 
form” approach” requires much more computation than classical re- 
gression. No specific programs on this subject have been worked out, 
but the programs outlined above, especially that for the inversion of 
matrices, have been found useful in a number of applications." 


5.6 Other Statistical Applications 


Finally we mention two further fields of application. These are, first 
the recent developments in input-output analysis the central problem 
of which is the inversion of matrices of large order whose elements are 
non-negative, the diagonal elements being relatively heavy." So far the 
method adopted has been to invert the symmetric matrix A’A, where 
A is the (non-symmetric) matrix whose inverse is required, and then 
postmultiply the inverse by A’; thus 


(A’A)-1A4’ = (A-1A'-1) A! 
= A-, 


The second field is that known as the Monte-Carlo method which has 
received considerable attention in the United States. In this, numbers 
generated by random sampling from an appropriate probability distri- 
bution are used to evaluate a function from which a solution may be 
obtained which converges to the true solution. Reference should be 
made to [12] for an account of this method. 


6. SUMMARY 


In Section 5 of this paper we have given some examples of the way 
in which most of the arduous and mechanical portions of a piece of 
econometric analysis can, with the aid of a relatively small number of 
programs, be reduced to purely automatic processes suitable for elec- 
tronic computation. By these means the time interval between the 
conception of a hypothesis and its testing against observational data 
can be substantially reduced, and the research worker can devote a 
greater proportion of his time to the problems of interpretation and the 
formulation of concepts. Thus in econometrics it is frequently difficult 





12 See Anderson & Rubin [1]. 

13 See M. R. Fisher [4]. 

4 For an alternative method of inverting these matrices which is also well suited for electronic 
computation, see Waugh [13] or more generally the paper by Leontief [11]. 





R 1953 


’f the 
“full 
-elec- 
luced 
il re- 
out, 
on of 


way 
e of 
r of 
slec- 
the 
lata 
te a 
the 
cult 


tronic 


ELECTRONIC COMPUTATION IN ECONOMIC STATISTICS 427 


to decide in advance which variables should be included in a regression 
analysis, or if the nature of the variables is known what lags should be 
introduced. If an electronic computer is available, various specifica- 
tions can be tried out and a selection made on the basis of the results. 
Further, the restriction of linear regression equations can be overcome 
without too much difficulty. 

In spite of the superiority in speed and flexibility of the large elec- 
tronic machines over conventional punched card machines, it seems at 
present that for the storage and summarization of a large amount of 
data such as are obtained from budgetary surveys, the latter machines 
may still be preferred. In such a case the existence of a punched card 
to paper tape converter will minimize the difficulty of transferring the 
summarized information to the high-speed machine for further analysis. 

At present the use of high speed computers in statistics is still re- 
stricted by the small number of available machines and the novelty of 
the techniques necessary to operate them. In these circumstances the 
statistician who is fortunate to gain access to a machine will prefer to 
devote most of the time he can spare for programming to the con- 
struction of programs and sub-routines with a wide validity. The for- 
mulation of statistical problems in terms of operations in matrix algebra 
is particularly helpful since modern high-speed machines are admirably 
suited to the multiplication and inversion of matrices of moderate size. 


We have, finally, to express our thanks to Dr. M. V. Wilkes of the 
Cambridge University Mathematical Laboratory for granting us ac- 
cess to the Edsac and to the many members of the Staff of the Labora- 
tory for their help and cooperation. 


REFERENCES 


[1] Anderson, T. W., and Rubin, H., “Estimation of the parameters of a simple 
equation in a complete system of stochastic equations,” Annals of Mathe- 
matical Statistics, 20 (1949), 46-63. 

[2] Bowden, B. V., “The application of calculating machines to business and 
commerce” in Manchester University Computer: Inaugural Conference (Fer- 
ranti, 1951), 30-31. 

[3] Dwyer, Paul S., Linear Computations. New York: Wiley, 1951. 

[4] Fisher, M. R., “A study of the U.S. poultry industry using the limited in- 
formation method,” a paper read to the 14th European meeting of the 
Econometric Society at Cambridge, August 1952. 

[5] Fox, L., Huskey, H. D., and Wilkinson, J. H., “Notes on the solution of 
algebraic linear simultaneous equations,” Quarterly Journal of Mechanics 
and Applied Mathematics, 1 (1948), 149-73. 





1% For an example see [7] p. 366. Another investigation in which the Edsac was very helpful is to 
be found in [9]. 





428 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


[6] Hartree, D. R., Numerical Analysis. Oxford: Oxford University Press, 1952, 
[7] Houthakker, H. S., “Some calculations on electricity consumption in great 
Britain,” Journal of the Royal Statistical Society, Series A, 114 (1951). 

[8] Houthakker, H. S., “The econometrics of family budgets,” Journal of the 

Royal Statistical Society, Series A, 115 (1952), 1-21. 
[9] Houthakker, H. S., and Tobin, James, “Estimates of the free demand for 
rationed foodstuffs,” Economic Journal, 62 (1952), 103-18. 

[10] Koopmans, T. C., Rubin, H. and Leipnik, R. B., “Measuring the equation 
systems of dynamic economics” in Koopmans (ed.) Statistical Inference in 
Dynamic Economic Models. New York: Wiley, 1950. 

[11] Leontief, Wassily W., “Computational problems arising in connection with 
economic analysis of industrial relationships” in Proceedings of a Sym- 
positum on Large-Scale Digital Calculating Machinery. Harvard University 
Press, 1948, 169-75. 

[12] National Bureau of Standards, Monte Carlo Method (Applied Mathematics 
Series 12). Washington: U. S. Government Printing Office, 1951. 

[13] Waugh, Frederick V., “Inversion of the Leontief matrix by power series,” 
Econometrica, 18 (1950), 142-54. 

[14] Wheeler, D. J., “Program organization and initial orders for the EDSAC,” 
Proceedings of the Royal Society A, 202 (1950), 573. 

[15] Wilkes, M. V., “Automatic Calculating Machines,” Journal of the Royal 
Society of Arts, 14th December, 1951, 56-90. 

[16] Wilkes, M. V., and Renwick, W., “The EDSAC—an electronic calculating 
machine,” Journal of Scientific Instruments, 26 (1949), 385. 

[17] Wilkes, M. V., Wheeler, D. J., Gill, 8., Programing for an Electronic Digital 
Computer. Cambridge, Mass.: Addison Wesley, 1951. 

[18] Yates, Frank, Sampling Methods for Censuses and Surveys. London: Griffin, 
1949. 





THE ELEMENTS OF AN INDUSTRIAL 
CLASSIFICATION POLICY* 


Watt R. Simmons 
U. S. Bureau of Labor Statistics 


n A recent publication, the Bureau of Labor Statistics reported that 

242,000 workers were employed in the General Industrial Machinery 
industry in the United States; that employment in the industry, down 
one percent over the previous month had increased six percent over the 
same month a year ago. In another release the Census Bureau shows 
retail sales of 817 million dollars for one month in the Apparel indus- 
try. That release states further: “Among apparel stores, which as a 
group showed no change in June 1952 compared with June 1951, 
women’s ready-to-wear stores showed sales up 3 percent, while men’s 
and boy’s clothing stores showed a decrease of 3 percent.” Similar 
statements appear daily in the publications of statistical agencies. In 
the literal sense, “What is the meaning of these statistics about indus- 
trial and commercial activities?” 

A direct answer to the question is that these statistics are the end 
product of particular surveys conducted under particular sets of con- 
ditions and particular procedures. That answer is precise and correct. 
It carries, however, some of the flavor of the remark of the villager who, 
when asked why the railway depot was so far from the town square re- 
plied, “Because that’s where the trains always stop.” 

Now a statistical survey is subject to many hazards. Dr. Deming 
has listed 19 sources of error in one compilation. Others have added 
to that list. The last decade has seen real progress in identifying, iso- 
lating, and measuring the effect of such hazards as error and variability 
of reply to questionnaire, non-response bias, processing error, and sam- 
pling variability. Every step forward in this direction enriches the an- 
swer to the question, “What is the meaning of the data?” 

Some of us who spend much of our time analyzing statistics from the 
methodological and the procedural side are of the opinion that one of 
the most significant sources of potential error or ambiguity, especially 
in data reported by business establishments, is to be found in the in- 
dustrial classification of those data. It is not my intention to argue 
that classification is necessarily the greatest of all survey hazards—in- 
deed this is not the occasion for ranking the components of survey er- 





* A paper presented asa part of the program on, “The Meaning of Statistics Classified by Industry,” 
at the annual meeting of the American Statistical Association, Chicago, Illinois, December 29, 1952. 


429 





430 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


ror. But before undertaking a discussion of classification policy and 
its impact on statistics, I should like to offer some evidence which testi- 
fies to the high respect to which industrial classification is entitled as a 
builder of statistics, both good and bad. 

1. Consider the statistic to which I referred a moment ago concern- 
ing employment in the General Industrial Machinery industry. If the 
definition of that industry had included agricultural machinery—a not 
unreasonable possibility—employment for the industry would have 
been 427,000 instead of 242,000; a 76 percent larger figure. Incidentally, 
the change over the year woul: nave been one percent rather than the 
six percent which was noted. Similar situations are common for statis- 
tics on production, sales, or other items. 

2. Suppose one plant in the machinery industries employed approxi- 
mately 25,000 workers. With the present development and widespread 
knowledge of what is meant by the term “number of workers em- 
ployed,” it is most unlikely that any reporting or processing error in- 
volving these data would exceed ten percent of the true total, and much 
more likely that any deviation from the true figure would be less than 
1 percent of the plant total or less than one tenth of one percent of the 
grand total for the General Machinery Industry. On the other hand, the 
decision to classify this plant into the industry, or to classify it into 
another affects the General Industrial Machinery totals by a full 10 
percent. 

3. More than a year ago, an Interagency Committee, under the 
chairmanship of the Federal Bureau of the Budget, was formed for the 
purpose of analyzing and coordinating the several employment figures 
which then existed. A very considerable number of man-hours has been 
devoted to that task—and I am pleased to say that real progress has 
been made. No cost records of this reconciliation task are available, but 
it is certainly being on the conservative side to state that it has been 
necessary to devote more than 90 percent of the total effort to differ- 
ences arising from industrial classification policy and practice. 

Without further belaboring this matter of the transcending influence 
of industrial classification on establishment statistics, I shall proceed 
with an analysis of classification policy. 


Purpose and Objective of Industry Classification 


Industry Classification has two fundamental objectives and perhaps 
five essential supporting specifications. 

A Tool for the Management of Data. The primary purpose of Indus- 
trial Classification is to provide a system for organizing data into under- 





AN INDUSTRIAL CLASSIFICATION POLICY 431 


standable, manageable blocks of information. It is a procedure for sim- 
plifying the collection, processing, analysis, and presentation of data. 
At every step of a survey, beginning with the formulation of goal, and 
extending through the assembly of a mailing list, the design of ques- 
tionnaire and of sample if there be one, the collection, verification, and 
tabulation of data, and reaching finally the analysis of findings and 
their publication, we find that classification of materials into mean- 
ingful, convenient industrial categories is an all but absolutely neces- 
sary aid. 

The Class Definition. In a somewhat different sense, an equally im- 
portant purpose of industrial classification is to establish the boundary 
of the category to which given statistics on industrial activities relate, 
and to define the content of that category. In its simplest terms, the 
classification must say for every recognized category, what is included, 
what excluded. Recall the statistics mentioned earlier on sales by Ap- 
parel Stores. The classification must define an Apparel Store. It must 
answer such questions as, Are the Clothing Sections of large depart- 
ment stores included? Are shoes apparel? If yes, does the answer extend 
to shoe repair shops? How about costume jewelry? Does the term in- 
clude wholesale outlets? Does it encompass second-hand stores? Does 
it include importers? Is a tailor shop an apparel store? Does the cate- 
gory include all or a part of the store which sells both dry goods and 
apparel, or groceries and apparel? How are the Naval Clothing Factory 
and Naval Small Stores treated? We could continue for a good while. 
The definitional job is not endless, but neither is it a light task. 

These are the prime purposes of industrial classification: To provide 
a tool for the management of data, and to define the category to which 
the statistics about which we may be talking relate. The discipline fails 
its objective, however, unless it satisfies five additional specifications. 

The first of these already has been implied: it is that the categories 
which are defined must be meaningful in the judgment of users of the 
data. Fulfillment of this condition obviously is not unique, in view of 
the widely different specific desires of different users, and the subjective 
nature of the concept. Nevertheless, it is clear that some formulations 
of classes would result in nonsense categories, while others would con- 
form to the opinions of large numbers of persons as to what categories 
are appropriate. 

The second specification is that the classes must be formed in such a 
manner that basic data for those classes either are readily available or 
can be made so at tolerable cost. A corollary of this proposition says 
that the distinctions which separate one class from another must be 





432 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


readily understandable and capable of easy application. Although avail- 
ability is relative, availability in terms of cost is a real thing and can be 
weighed. For data reported by industrial and commercial businesses, 
availability usually, although not always, means existence in written 
records of the company. Often it means existence in a particular type 
of record. For example, it would be desirable to classify all activity of 
State and local governments into each of the industrial activities which 
are recognized for private enterprise. The day may come when our 
society considers this action worth the cost. But today the records nec- 
essary for reporting such information simply do not exist. The necessity 
for existence of information in a particular type of record is illustrated 
in the printing industry. It would be desirable to have statistics on in- 
ventory, capital expenditure, and earnings separately for all job print- 
ing and for newspapers. In the United States there are many plants 
which are combined newspapers and job shops. The great majority of 
these have sales records which distinguish between the job work and 
newspaper; many have cost records which permit computation of price 
estimates for customers; but very few can separate the two activities 
in their payroll records, nor in their stock-on-hand or capital equipment 
accounts. Therefore it is realistic to recognize combined newspapers 
and job shops as a single indivisible-industrial category. 

A third essential specification for an industrial classification and pol- 
icy is that they foster continuity in statistics over time. The classifica- 
tion must have at least two characteristics in order to accomplish this 
function. It should seek to identify classes of business establishments 
which are relatively stable, and it should change the definition of a 
class only when there are compelling reasons for doing so. I would sug- 
gest that a business establishment is stable with respect to classifica- 
tion if normally it remains in the same classification for at least a 12- 
month period. For example, it may be appropriate to distinguish be- 
tween factories which manufacture footwear, and those which make 
luggage. It would be unwise to attempt to distinguish between fac- 
tories which make shoes with leather heels and soles and those which 
make shoes with rubber heels or soles, since the same factory typically 
does both on different days, or even on the same day. Clearly a statis- 
tical series suffers a break in continuity of greater or lesser severity 
each time the class it measures is redefined. Consideration at this point 
and at many others in the field of practical classification must be given 
to the decisions of yesterday as well as the evidence of today. 

Good industrial classification must also promote comparability of 
statistics. Comparability is a blanketing concept which I do not pro- 





AN INDUSTRIAL CLASSIFICATION POLICY 433 


pose to treat in detail in this paper. Let me merely exhibit some of its 
facets so that we may have a feeling for this aspect of classification 
policy. Comparability requires that the collecting agencies, the re- 
porting companies, and statisticians and users in general interpret and 
execute classification in a common way. It requires definitions and prac- 
tices which apply alike to all the principal statistics which are classi- 
fied—to such items as employment, wages, hours of work, production, 
sales, inventory, capital expenditure, capacity, claims for benefits, ma- 
terials used, placement of workers on jobs, and taxes paid. Comparabil- 
ity also connotes balance, in the sense that both in the structure of a 
classification and in its use, there should exist a tendency to give simi- 
lar attention to activities of similar importance. It would, for example, 
be inappropriate to build a code which gave equal importance in the 
United States to the manufacture of transportation equipment and the 
manufacture of umbrellas. This of course does not mean that the manu- 
facture of umbrellas might not be recognized as a sub-class. 

Finally, the classification must be exhaustive in the sense that every 
activity encountered in the economic world must be classifiable into 
one or another of the defined categories. Preferably all miscellaneous 
or “not elsewhere classified” groups should be kept small. 

We have looked at the over-all purpose and objective of a classifica- 
tion and policy, and at the leading additional specifications which must 
be met. I’d like now to come to closer grips with a specific problem: to 
identify the components of a classification policy, to determine what 
decisions must be made in order that a classification policy shall exist. 
The advantage of this course must be nearly self-evident. A problem 
can be solved only when the problem itself is understood. In a classifi- 
cation program there is great danger that day-to-day horseback rulings 
made without reference to a rulebook and related principles would lead 
to confusion, ambiguity, and inconsistency in statistics. It is to this 
identification and formulation process that I have given the title, “The 
elements of a classification policy.” 


THE ELEMENTS OF AN INDUSTRIAL CLASSIFICATION POLICY 


1. The Formal System. The largest single component of policy is the 
List of Categories which constitute the classification. The List should 
include not only the titles of each recognized category, but definitions 
of those categories, a statement of the principles upon which the classes 
were formed, and a cculing system which permits easy identification of 
categories and to some degree relates individual categories to one an- 
other and to the whole industrial economy. I shall say little regarding 





434 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


the List for two reasons: (a) It is such a very extensive topic that | 
could not do it justice in the space that is available, and (b) This part 
of the subject has received much more attention over the past two 
decades than some of the other elements, and, I think, has been com- 
petently handled. In fact this Association only last year elected to fel- 
lowship, V. S. Kolesnikoff of the U. 8S. Bureau of the Budget for excel- 
lence of work on industrial classification standards. I do not wish to 
pass this critical component, however, without taking note of several 
cardinal points. The first is that The Standard Industrial Classification 
Manual, developed through interagency committees of the Federal 
government in cooperation with trade associations, labor unions, re- 
search groups, and other organizations is a List which is widely used 
by the Federal agencies and to an increasing extent by other bodies. I 
should like to urge all persons who are able to do so to extend its use, 
and to take an active part in bringing about further improvements in 
the SIC manual. The Economic and Social Council of the United Na- 
tions has adopted a Standard Industrial Classification of all Economic 
Activities, which is similar to the SIC, and has recommended its use to 
all member nations. 

Both the UN classification and the SIC, it should be made clear, are 
classifications by industries, and not by occupations, or by commodi- 
ties. 

Another difficult-to-apply, but basic and pervasive principle of the 
SIC is that the classification must conform to the existing structure of 
American industry. 

As we move to consideration of the other elements of classification 
policy we shall note that the List and the other elements are not en- 
tirely independent of one another; there is interaction among the ele- 
ments. 

2. The Mode of Classification. The second element of policy is inti- 
mately related to the List but is worthy of special note because it bears 
so forcefully on still other elements. This is the choice of mode of classi- 
fication; i.e., selection of the leading concept which is to guide us in 
characterizing a group of activities as an industry. The dominant view 
is that the primary objective should be to create classes which tend to 
be homogeneous in their response to economic stimuli, and that the 
producti or service which is brought into the market is generally best for 
this purpose. The name “nature of business activity” is given to this 
concept. Other choices could have been made; for example, the primary 
measurement might be in terms of materials used, type of capital in- 
vestment, nature of ownership or corporate structure, size of organiza-, 





ZR 1953 


hat I 
3 part 
t two 
com- 
0 fel- 
excel- 
sh to 
Vera] 
ation 
deral 
8, re- 
used 
ies, I 
3 use, 
its in 
| Na- 
omic 
se to 


, are 
nodi- 


f the 


re of 


ition 
> en- 
-ele- 


inti- 
ears 
aSSI- 
is in 
riew 
d to 
the 
for 
this 


AN INDUSTRIAL CLASSIFICATION POLICY 435 


tion, technology, or work-force requirements. Each of these influences 
does have an impact on the industrial structure of American industry. 
Purposes can be found for which each is better suited as a yardstick 
than any of the others. All have in fact a role in the SIC. But it is “na- 
ture of business activity” ; i.e., product or service brought into the mar- 
ket that gets first consideration. 

8. The Unit to be Classified. The third element of policy is perhaps 
the most difficult to resolve. For what unit shall data be reported? We 
seek resolution through a blend of the purpose to which the data will 
be put, and the specification that required information of respondents 
must be available. At least four concepts deserve consideration. The 
broadest is a unit outlined by the span of financial control. Although 
such a unit has relationship to economic power, it is very difficult to 
identify in practice and is too heterogeneous for most purposes. 

The enterprise, or legal entity—corporation, partnership, individual 
doing business as such, or a cooperative association—is a good choice 
from several points of view. It is useful in financial matters, it is related 
to economic power, it is perhaps the least ambiguous unit in the sense 
that it is determinable in practice. But companies cross many industry 
lines, and also State lines, thereby being subject to different laws, regu- 
lations and influences. They also are too heterogeneous for most pur- 
poses. It seems we must look for smaller units which are engaged in rela- 
tively specific, preferably single activities. This notion suggests that the 
unit might be the Department which is engaged either in direct produc- 
tion of a commodity or service or possibly in an ancillary activity such 
as the power plant of a factory. With this choice we secure a unit which 
has a relatively high degree of homogeneity with respect to nature of 
business activity and among such units there probably tends to be 
homogeneity with respect to many of the statistics in which users are 
most interested. Unfortunately, we face new difficulties along this road. 
The boundaries of a Department are not always easy to locate inas- 
much as companies are organized in a variety of ways. There are both 
theoretical and operational difficulties in distinguishing between direct 
and ancillary activities. Finally, and conclusively, in very large num- 
bers of situations, the desired statistics simply are not recorded or main- 
tained on a Departmental basis and cannot be reported in that manner. 

The most suitable unit seems to be the smallest unit for which it is 
possible to provide all the information normally sought in statistical 
surveys. It appears further that this unit must lie between the com- 
pany and the department. The unit which most of us accept is the “es- 
tablishment.” An establishment is usually defined as a single physical 





426 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


location where business is conducted or where services or industria] 
operations are performed; for example, a factory, mill, store, mine, or 
farm. Under certain circumstances, if the single location is comprised 
of two or more Departments for which separate payroll and inventory 
records are maintained and which are engaged in separate and distinct 
activities, each Department may be considered an establishment. 

4. Multi-activity Locations. If there were complete agreement on just 
what constitutes an ident’fiable “nature of business activity,” and if at 
each location only one such activity were performed, our task of clas- 
sification would be an easier one. We should not then find it necessary 
to explore the question of how records were kept, for the location would 
be a single establishment. But there are locations at which more than 
one product or service are brought into being. Three situations arise. 
In the first, there is agreement that the products or services represent 
more than one activity for which separate industrial categories are de- 
sirable, and these activities are such that the necessary records are 
maintained separately for them. In this case, the action is clear: sepa- 
rate categories are established in the List, and each activity (Depart- 
ment) is treated as an establishment. The second case is identical with 
the first, except that the records, while not initially available, can 
through suitable action be created. The third case is the one in which 
either it seems undesirable to separate the activities, or if that be de- 
sirable, it is too costly to produce the necessary records. 

For this third case another question must be answered. What further 
measurement should be used to classify this multi-activity establish- 
ment into a single category? There is little quarrel with the rule, “Clas- 
sify the establishment according to its principal activity, disregarding 
for this purpose all other activities.” There is not unanimity of opinion, 
though, on the proper procedure for determining “principal activity.” 
Without discussing the pros and cons of several possible alternatives, 
let me say that my preference, following the notion of response to eco- 
nomic stimuli, mentioned earlier, is to weigh the different activities by 
the amount of income produced. That which is greatest by this meas- 
ure is the principal activity. In operations, because value of sales is 
usually a good approximation for income produced (for products or 
services at the same stage of production), and because sales figures are 
usually available whereas amount of income produced is not, I would 
use value of sales in selecting the principal activity. 

§. Length of Time Interval. After the unit, mode, and manner of 
classification have been agreed upon, the next question faced is the, 
length of time interval on which the classification of an establishment 





AN INDUSTRIAL CLASSIFICATION POLICY 437 


should be based. This determination might be given by the answer to 
the question, “What’s the establishment doing now?” Or it might be 
made on the basis of activity for a week, month, year, or other period. 
For the majority of establishments, the answer will be the same for any 
interval up to a year. But for some it will not. Since there are estab- 
lishments which change activity at very frequent intervals, we would 
not choose a time interval so short that the classification of these firms 
was highly unstable. In fact, we have set stability of classification as a 
desirable feature in accepting the specifications of continuity and com- 
parability. These considerations, plus the fact that seasonality has an 
annual cycle, strongly suggest that classification be based on a 12- 
month period of activity. 

6. Time Lag or Lead. Closely associated with the length of period 
on which classification is based is the relationship between that period 
of reference for code determination and the period to which the data 
collected and so coded refer. This is largely an operational administra- 
tive problem, but it is also a policy matter. For a historical survey of 
the type of the quinquennial censuses, the answer is fairly clear: these 
surveys normally cover a one-year period, all in the past; data for the 
one-year period are classified according to nature of business for that 
same 12-month period. For a current survey of the type of the BLS 
monthly Employment Statistics series the proper solution is not so 
immediately apparent. Consider the situation, say, in March 1953. In 
a monthly series, the estimates will be classified and published from 
one to two years before the Census-type codes for the year 1953 will 
become known. Several courses are possible, some involving predic- 
tion of activity in the future. The most common practice is perhaps to 
use activity in the previous year as the basis for classification. This dif- 
ference in timing is one of the most troublesome features in securing 
and maintaining comparability among different sets of data. No thor- 
oughly satisfactory solution is known to me. 

7. Frequency of Review. Another dimension of the timing problem— 
or perhaps it is only another way of looking at the lead or lag charac- 
teristic—is the frequency with which the classification of an establish- 
ment should be reviewed, and changed if the activity of the establish- 
ment has changed. With this question let us look simultaneously at 
still another element which interacts with the timing element. 

8. The Effect of Previous Classification Upon Current Classification. 

Should the current classification of an establishment be independent 
of a previous classification? This question brings us face to face with 
perhaps the most vexing and controversial sector of the entire subject. 





438 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Consider this example: In 1950, sales of the West Dakota Corporation 
(a single establishment) came 55 percent from manufacture of electric 
motors and 45 per cent from aircraft parts. Let’s say total sales were 
$100,000. We review the sales again the next year and find the total 
unchanged, but for 1951, activities are reversed in volume and now it 
is aircraft parts that account for 55 per cent of the total. Using the 
principles we have established and data for 1950, the West Dakota 
plant and all its $100,000 sales are classified for 1950 into the electric 
motors industry. If classification is based on 1951 data, and is inde- 
pendent of the earlier coding, statistics for 1951 will show all of the 
plant in the aircraft parts industry. If we are interested exclusively in 
estimates of level, the best decision we can make is that just implied: 
credit motors with $100,000 of activity in 1950, and none in 1951; 
credit aircraft parts with none in 1950, and $100,000 in 1951. This is 
the course advocated by exponents or “current” classification, although 
some advocates of the procedure would review the coding at more fre- 
quent than annual intervals. 

In reality the output of this establishment has contributed $55,000 
to motors in 1950 and $45,000 in 1951; and has contributed $45,000 
to the aircraft industry in 1950 and $55,000 in 1951. Current classifica- 
tion, as just defined, certainly does violence to statistics on trend, and 
to our concepts of comparability and continuity. Continuity can be 
maintained and trend reflected in a more nearly accurate manner if the 
1950 classification of the establishment is retained in 1951. The prac- 
tice of keeping the coding of an establishment unchanged from one 
period of time to another constitutes a policy of fized classification. It 
too, has weaknesses. Even if an establishment changes its activity 
slowly, the initial classification may become entirely unrealistic after 
a long enough interval. If the establishment changes rapidly, or dis- 
continues one activity and enters another, the fixed classification be- 
comes misleading in a short while. 

Is there a way of reconciling these policies, some method which ap- 
proaches current classification in producing accurate levels in statis- 
tics, but still retains to a degree the advantages of fixed classification 
in maintaining comparability and continuity? The answer is yes. There 
are several methods. All perhaps, can be termed current classification 
schemes modified by a resistance or reluctance principle. The essential 
feature of these schemes is that the current classification replaces the 
previous classification, provided the activity pattern has shifted by an 
amount in excess of some standard tolerance; otherwise the previous 
classification remains fixed. With suitable side conditions, certain opti- 
mum determinations of tolerance can be made. A good many persons 





AN INDUSTRIAL CLASSIFICATION POLICY 439 


and agencies over a number of years have employed some form of re- 
luctance in coding, but the first formal treatment of the concept known 
to me is in an unpublished memorandum written by Jack L. Ogus of the 
U. S. Census Bureau. 

A classification policy must then include decisions on whether to use 
fixed or current classification, or some specific resistance technique, and 
on the frequency of review of activity. 


* * * 


My discussion has recognized eight major components or elements 
of an industrial classification policy. Firm decisions on each of these 
elements are essential to a sound policy. Fortunately, when they have 
been made, one has gone a long way through the planning stages of a 
successful program. It should be added that the program will be more 
cohesive and will run more smoothly if these basic decisions are aug- 
mented by a set of written working rules which cover what might be 
termed pseudo-policy matters, many of which may be peculiar to the 
particular program. I shall make no attempt to enumerate these mat- 
ters, but will illustrate with a few examples: 

(a) Rules for classifying workers and activities which are not local- 

ized geographically. 

(b) Precise instructions for distinguishing between ancillary or auxil- 
iary activities which are included with the parent establishment, 
and those which are treated as separate establishments; e.g. ac- 
counting offices are always included with the establishment 
which they serve. 

(c) Definite procedure for adjusting for such coding errors as may be 
discovered. 

(d) Mechanical arrangements such as assignment of an identifying 
number (not name) to each establishment. 

(e) What types of interplant transfers should be considered sales? 


* * * 


In conclusion, I should like to stress these points: (1) Industrial 
classification is a many-sided methodology; explicit decisions and ac- 
tion must, and can, be taken with respect to its major elements. (2) 
Industrial classification is an approximating technique; it does not 
always give the fineness of detail that we might prefer, but because it 
chooses as building blocks the establishments, for which many records 
exist, it yields a wealth of information that perhaps no other selector 
can match. And finally, (3) the importance of the subject to economic 
statistics is difficult to overstate. 





EXPERIMENTAL DESIGNS AND PROBABILITY 
SAMPLING IN MARKETING RESEARCH* 


Max E. Brunk AND WALTER T. FEDERER 
Cornell University 


GENERAL CONSIDERATIONS IN MARKETING RESEARCH 


N GENERAL, the problems of marketing research center around the 
I companion objectives of market development and physical operat- 
ing efficiency. Much of the market development for a particular prod- 
uct depends on ability to determine the economic wants of both actual 
and potential consumers. The marketing system operates in an imper- 
fect way in bringing about practices and services most acceptable. In 
the large, the system is so constructed that products are offered to the 
public on a “take-it-or-leave-it” basis with adjustments made by expe- 
rience in a slow and cumbersome way. A study of these imperfections 
in the system constitutes the most important problems of marketing re- 
search. 

A consumer’s decision to buy or not to buy is based on a multitude of 
motivations varying all the way from fickle whims to thorough study 
of value received per dollar spent. Small wonder then that the crude, 
unscientific observations of producers and merchants lead to uneco- 
nomic marketing practices which fail to satisfy the consumer and cost 
the producer and merchant vast sums in lost sales. The problem re- 
solves itself to one of measuring variables believed to be associated with 
volume of consumer purchases. 

There are two distinct and conventional avenues of attack on such 
problems: 

(i) The problems may be studied under controlled or laboratory conditions 

using experimental designs. 


(ii) The problems may be studied under uncontrolled or actual conditions 
using sample surveys. 


Using the experimental method the researcher must describe and con- 
trol the conditions under which the effects are produced. Variables not 
kept constant must be measured and eliminated statistically. The data 
gathered with the survey method are the everyday experiences of the 
populations under study. Elimination of the effect of non-test variables 
is attempted by stratification in sampling and by statistical analysis 
after the data are gathered. Assuming that this can be done the latter 





* Presented at the American Statistical Association Meetings in Chicago, December 27, 1952. 


440 





1 the 
erat- 
orod- 
ctual 
nper- 
e. In 
0 the 
»xpe- 
tions 
ig re- 


de of 
tudy 
‘ude, 
1eC0- 
cost 
n re- 
with 


such 


tions 


tions 


EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 441 


approach is restricted in that innovation cannot be tested. This is a 
serious restriction for market development per se implies innovation. 
Any satisfactory method must meet two major requirements if re- 
sults are to have utility; these are: 
(i) The method must permit relatively satisfactory means of isolating the 
effects of specific variables. 


(ii) The effects of specified variables must be measured under conditions es- 
sentially the same as those found under actual conditions. 


Once these requirements are met the selection of procedure is largely 
one of cost consideration per unit of information. 

Of paramount importance to the successful solution of a marketing 
research problem is a thorough understanding of the principles in- 
volved and of the nature of the tools employed whether they be sta- 
tistical or otherwise. Thus, it may be necessary to employ a team of 
scientists to effect practical solutions. The statistician advisedly may 
be a member of such a team. 

Certainly the researcher must keep in mind that solutions are noth- 
ing more than a stage in development. In this sense solutions to mar- 
keting problems are sought only in terms of improvement over existing 
practices. The theoretical potential of market development is always 
beyond grasp with the area between present practice and theoretical 
perfection always offering a fertile field for research. 


SOME EXPERIMENTS IN MARKETING RESEARCH 


The remainder of the discussion will be devoted to some illustrations 
of research designed to measure consumer wants for one product, ap- 
ples. The coordinated sequence of projects to be described were under- 
taken at the request of apple growers who at the outset were of the 
opinion that quality of product was one of the most serious factors im- 
peding apple sales. After considerable deliberation it became apparent 
that the industry was more concerned with bruising than any other 
quality problem. 


Studies on Bruising 


In 1948 and 1949 Van Waes undertook to determine the effect of 
bruising on consumer acceptance.! Since it was assumed that different 
degrees of bruised apples were in the market place the survey method 
theoretically would have offered a satisfactory tool. However, previous 
experience in attempting to isolate the effect of one particular variable 





1 Van Waes, D. A., Economic Significance of Bruising on Retail Sales of McIntosh Apples, Ph.D. 
Thesis, Cornell University Library, Ithaca, N. Y., 1951. 





442 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


from a multitude of others through either stratification or analysis led 
to the use of controlled experiments in which non-test variables could 
be held constant. 

For such a test the self-service supermarket appeared to be made to 
order for in such a store the reactions of customers could be observed 
and measured. It would have been a relatively simple matter to run a 
series of tests in which matched lots of apples varying only as to de- 
gree of bruising were offered to buyers but in order to be able to fore- 
cast sales the tests must be conducted in an environment simulating 
actual marketing circumstances. It was not the general practice for 
stores to offer several lots of apples varying only as to bruising. Rela- 
tive sales from the various lots would not indicate what actual sales 
would be if only one of the lots were offered. Many such experiments 
using matched lots have been conducted in the past but the results are 
meaningless in predicting actual sales of either one or the other lot.? 

In order to simulate actual conditions it was necessary to have only 
one degree of bruising in a store at any one time, and in order to obtain 
valid comparisons among the various degrees of bruising it was neces- 
sary that they be tested under comparable conditions. Since time and 
store differences represented two major sources of variation, a design 
with two-way elimination of variation was desirable. The latin square 
design was admirably suited for this situation.’ In this design every 
treatment (the various degrees of bruising) appeared once in a row 
(the particular time interval selected) and once in a column (the store). 
The latin square design was found to be very effective in marketing 
research for controlling or measuring variations due to store and time 
differences.‘ Therefore, in order to study the effect of bruising on the 
volume of apple sales four degrees of bruising were set up with the lots 
of apples alike in all other respects. These four treatments were tested 
in three 44 latin squares. The columns of the three sets of 44 latin 
squares were the 12 stores (one in each of 12 cities) in which the experi- 
ment was conducted. The rows were four two-week periods. 

As a companion study to the one described above a survey was made 
of randomly selected stores to determine the extent of bruising on ap- 
ples normally on the market. The sample was drawn in the same cities 





2 Van Waes, D. A., “Evaluation of Research Techniques Used for Measuring the Influences of 
Factors Believed to be Associated with Volume of Consumer Purchases in Retail Stores,” Methods of 
Research in Marketing, Paper No. 1, Department of Agricultural Economics, Cornell University, July 
1951. 

* Fisher, R. A., The Design of Experiments, 5th Edition, Hafner, New York, 1949. 

4 Dominick, Jr., B. A., “An Illustration of the Use of the Latin Square in Measuring the Effective- 
ness of Retail Merchandising Practices,” Methods of Research in Marketing, Paper No. 2, Department 
of Agricultural Economics, Cornell University, June 1952. 





EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 443 


included in the controlled experiment. This was done in order that rec- 
ommendations could be made from the results of the controlled experi- 
ment in terms of the extent of bruising on apples actually in the market 
place. 

The companion studies furnished two important items of informa- 
tion: 

(i) The extent of bruising necessary to reduce the volume of apple sales. 

(ii) The extent of bruising on apples found in the market place. 


With the above information it was possible to inform growers and store 
owners that present methods of handling apples were not causing un- 
due damage as measured by the volume of apples purchased by cus- 
tomers.’ Only two per cent of the apples in the 504 sample records were 
as badly damaged as the experimental treatment which had the most 
bruising and this treatment was the only one to which buyers responded 
through decreased purchases. Measuring the effect on sales of this two 
per cent would have been very difficult, if not impossible, if only the 
sample survey data had been available. This illustrates one of the dif- 
ferences between controlled and uncontrolled experiments and how the 
two can be combined to advantage. 


Studies of Merchandising Practices 


In the process of making the studies on bruising many varied prac- 
tices of pricing, displaying, and packaging apples were observed to- 
gether with highly varying sales rates. This raised the question of how 
these practices affected sales. To obtain information on this Dominick 
conducted a series of experiments on these as well as innovated varia- 
bles.* A series of 44 latin squares were used in 4 stores as columns and 
4 time periods of 1 or } days as rows (Figure 1). Over a period of 12 
weeks, in the fall of 1950, 16 different merchandising practices (the 
treatments) were compared, and 24 individual experiments were con- 
ducted. 

Because approximately half of the volume of grocery sales occur on 
Friday and Saturday and because larger grocery orders per customer 
are purchased on weekends the week was divided into two parts. The 
first part of the week consisted of the first four days. On weekends both 
Friday and Saturday were divided into two parts so that the two days 
combined formed four time periods. Thus there were two latin squares 





5 Brunk, Max E., “Influence of Bruising on the Sale of Apples,’’ Proceedings New York State Horti- 
cultural Society, 95: 73-80, 1950. 

6 Dominick, Jr., B. A., Merchandising McIntosh Apples Under Controlled Conditions—Customer 
Reaction and Effect on Sales. Ph.D. Thesis, Cornell University Library, Ithaca, New York, 1952. 





444 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


in each week. Each set of four treatments were tested over a two-week 
period. 

The treatments selected for testing in any two-week period depended 
largely upon the results of the preceding experiments. This practice 
quickly led to innovations in the selection of treatments, care being 
taken to determine the practicability of any treatment before it was 
included in an experiment. This sequential selection of treatments, al- 
though not formalized by mathematical rule, resulted in the selection 
of 16 different. merchandising practices whose sales varied from 11 to 
33 pounds of apples per 100 customers.’ 

The most effective treatment, an innovation, was recommended to 
the trade less than a month after the store tests were completed. Within 








First part Second part 
of week of week 








Store 








2 





Monday Friday a.m. 
Tuesday Friday P.M. 
Wednesday Saturday A.M. 
Thursday d Saturday P.M. 





























Figure 1. Diagrammatic Lay-out of Two 4X4 Latin Squares 
for Four Treatments (A, B, C, D). 


two years the treatment, though modified in some cases, was in general 
practice by the trade with over two-thirds of the apples so sold in West- 
ern New York. The widespread application of the results of the experi- 
ment led to many associated problems beyond the scope of the re- 
search. For example, the New York State legislature promptly amended 
the grading laws to facilitate the use of this merchandising practice. 
Also new packing methods were developed on numerous farms. Me- 
chanical bagging equipment was developed and new master shipping 
containers were devised after much trial and error by the trade. 

The final test of the validity and usefulness of any research is the 
experience of actual application. As previously indicated the results of 
these experiments were in wide application very shortly after the tests 
were conducted but only isolated instances of experience are available 





7 Brunk, Max E., and Dominick, B. A., Jr, “Experiments Show What Makes Your Apples Sell,” 
Proceedings New York State Horticultural Society, 96: 21-28, 1951. 





EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 445 


to the authors. The recommended method of merchandising apples 
was tried over a twelve-week period in the fall of 1951 in ten stores of a 
large national chain organization.® Their sales increased 42 per cent and 
the practice was quickly extended to other stores in the chain. The re- 
sults of the controlled experiments had indicated that the apple sales 
of this organization could be expected to increase 40 per cent. Another 
large chain organization using the innovation of 1952 reported almost 
identical volume (pounds) of sales in'1951 and 1952, but at 60 per cent 
higher retail prices. By 1953 practically all chain organizations in the 
country and thousands of independent grocers had incorporated the re- 
sults of these experiments into their merchandising practices. 

In December 1952 apple prices were more than double the prices ex- 
isting during the first tests in 1950 and some question arose concerning 
the effect this price increase might have had on the recommended mer- 
chandising practice which consisted of a combined bulk and polythene 
package display priced in 6 pound units. Consequently a latin square 
experiment was conducted comparing 2, 4 and 6 pound pricing units as 
had been done in 1950. The same stores were used for the tests. Again 
the recommended practice proved most effective in maximizing sales 
with results similar to those obtained in 1950. 


Study of Carry-over Effects 


Many of the treatments tested during 1950 also were retested by 
Henderson in 1951 under a different price situation and in 12 different 
stores located in 12 large cities. The conclusions, without exception, 
were the same as those obtained in 1950 (Table 1). A new feature was 
incorporated in the design of these experiments.!° Because the day to 
day rotation of treatments among stores created an artificial condition 
not normally found in the market place, it was desirable to determine 
the effect of given treatments on following treatments. To do this the 
treatments were rotated among stores every week instead of every day 
and a double change-over design was used." In using the change-over 
design particular treatments must be in given stores a sufficient time 
to insure that carry-over effects stem only from immediately preceding 





8 Davis, Lloyd H., “Marketing Research Results Work,” Cornell Farm Economics, 186: 4888-4889, 
October 1952. 

® Henderson, P. L., Influence of Selected Marketing Services on Apple Sales, Ph.D. Thesis, Cornell 
University Library, Ithaca, New York, 1952 and Brunk, Max E., “How We Increased the Retail Sales 
of Apples,” Proceedings New York Horticultural Society, 97: 24-33, 1952. 

10 Henderson, P. L., “Application of the Double Change-over Design to Measure Carry-over Effects 
of Treatments in Controlled Experiments,” Methods of Research in Marketing, Peper No. 3, Department 
of Agricultural Economics, Cornell University, July 1952. 

4 Cochran, W. G., Autrey, K. M., and Cannon, C. Y., “A Double Change-over Design for Dairy 
Cattle Feeding Experiments,’ Journal of Dairy Science, 24: 937-951, 1941. 





446 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


TABLE 1 
EFFECT OF MERCHANDISING PRACTICES ON APPLE SALES 








September—December 1950 September—December 1951 








Pounds Pounds 
Practice per 100 Practice per 100 
Customers Customers 








Promotional devices (All 4 lb. bag Packaging material (All in 5 lb. 
and bulk) units) 
Display without promotional de- No packages—bulk only 
vices Red mesh bags and bulk 
Display marked as to variety and Paper window bags and bulk 
use Pliofilm bags and bulk 
Witb window streamers added Purple mesh bags and bulk 
Display doubled in size Polythene bags and bulk 
With added window display of 
apples Size of pricing unit 
Four-pound Polythene bags and 
Bulk only bulk 
Priced in two-pound units Five-pound Polythene bags and 
Priced in four-pound units bulk 
Six-pound Polythene bags and 
Package only bulk 
Four-pound Cellophane bags Eight-pound Polythene bags and 
bulk 
Combination package and bulk dis- 
plays Five-pound mesh bags and bulk 
Two-pound Cellophane bags and Eight-pound mesh bags and bulk 
bulk Ten-pound mesh bags and bulk 
Four-pound Cellophane bags and 
bulk Location of display 
Four-pound Polythene bags and By scales 
bulk End of counter next to no fruit 
Six-pound Polythene bags and End of counter next to oranges 
bulk End of counter next to bananas 
Six-pound open hi-hat baskets 
and bulk 


Quality and price (All 4 lb. bag and 
bulk) 
23” min. priced 25% under 23” 17 
Bruise-free apples 24 
Price reduced 35% 29 
Highly colored apples 33 





treatments and not from earlier treatments.” It is believed that weekly 
rotations are satisfactory with most perishable foods particularly in 
view of the weekly shopping habits of people. 





12 The double change-over design consists of the k-1 orthogonal k Xk latin squares. The treatments 
are compared in various sequences. The double change-over design retains the advantages of the latin 
square in eliminating store and time effects and at the same time permits the measurement of carry- 
over effects. When carry-over effects are not present k-1 ordinary latin squares may be used instead of 
the double change-over design. If carry-over effects are present and a double change-over design is used, 
adjustments are made in the treatment means for the effect of the preceding treatment. Such adjust- 
ments tend to reduce the experimental error and to give unbiased comparisons of the treatment effects. 





EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 447 


The double change-over design was found useful in measuring the 
effect of the Thanksgiving and Christmas holiday trade. In one such 
instance using two orthogonal 3 X3 latin squares, carry-over effects of 
the treatment in one 3 X3 latin square were the reverse of those in the 
second 3X3 latin square. The second square was completed just prior 
to Christmas, a time when customers were buying relatively more of 
the larger packages as compared to their performance in the first 
square. Thus, the design proved useful in pointing up and detecting 
variation in buying habits at different times during the season. 


Comments on Techniques and Efficiencies 


The above discussion illustrates the application of two very useful 
designs to marketing research experiments. Of course other designs, the 
randomized complete block, the split plot, and the lattices may be suc- 
cessfully used for studying certain marketing problems. The particular 
nature of the problem and the sources of variation will determine the 
appropriate experimental design. 

It is interesting to note that missing values for the period of observa- 
tion, or “missing plots”, may and do occur in marketing research stud- 
ies just as they do in other fields of research. Failure to keep records or 
lost records is only one source of omission. Sometimes unforeseen de- 
velopments will occur such as street repairs in front of a store over a 
period of time. If a street is torn up in front of a store the customer 
count may decline far more than total sales because the obstruction will 
affect small sales more than the large ones. Also, fire or flood may pre- 
vent a store from operating in the accustomed manner. The analysis of 
experiments with missing observations may be handled in the usual 
manner as described by Cochran and Cox, Snedecor and others. 

To obtain an idea of the effect of stratification by time intervals and 
by stores the results of 34 experiments (Table 2) were studied. As a 
measure of relative variation in the various experiments the coeffi- 
cient of variation was computed for each experiment. The coefficients 
of variation were higher for the 24 experiments conducted in 1950 than 
for the others. In these experiments the time interval was one day while 
in the remaining experiments the time interval was either a one or two 
week period. Thus, one method for reducing the coefficient of variation 
is to use time periods of one week rather than of one day. It should be 
noted here that the coefficient of variation was computed from the 
residual mean square in the latin square without covariance. 

The efficiencies of the latin square relative to randomized complete 
block designs using stores as replicates are given in column five of Ta- 





448 


TABLE 2 


RELATIVE VARIATION AND EFFICIENCY DUE TO STRATIFICA- 
TION OR COVARIANCE IN 34 LATIN SQUARE ANALYSES 








Size of 
latin 
square 


Experiment 
conducted 


Coeffi- 

cient of 
varia- 
tion 


Efficiency relative to 





Randomized 
complete 
blocks using 
as replicates 





Stores | Times 


Com- 
pletely 
random- 
ized 


— 


Efficiency 
using co- 
variance 
analysis 





Yr. 





(Per cent) 





8X8 


4x4 
4x4 
4x4 


4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 
4x4 


6 X6 
6X6? 
4x4 
4x4! 
4X4? 
4x4# 





1948 


1949 
1949 
1949 


won 


1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 
1950 


COnowr WO NY 


wel We Oe Oe ne ee 
RPWONRKF OOCWONAA RP WN © 


1951 
1951 
1951 
1951 
1951 
1951 





19. 


17. 
15. 
oa 


45. 

30. 

31. 

25. 

32.73 
25.75 
37 .16 
37 .93 
40.58 
47.40 
31.06 
36.72 
34.64 
24.23 
38 .32 
47.79 
39 .60 
42.41 
53.48 
21.96 
36.24 
27 .88 
48 .45 
19.77 


19.19 
16.67 
6.09 
22.34 
6.72 
14.13 





214 214 


112 702 
113 243 
146 1241 


120 143 
341 126 
108 141 
210 149 
90 192 
220 181 
124 159 
225 152 
98 71 
102 128 
184 185 
115 150 
95 158 
286 125 
152 101 
108 94 
225 120 
152 128 
80 82 
135 301 
121 284 
116 152 
84 91 
141 166 


132 226 
101 368 
72 3237 
182 3921 
283 9977 
170 731 














1 Other apples. 


2 All apples. 


? Oranges. 


® Covariance on volume of grocery and produce sales. 
> Covariance on number of customers. 


© Covariance on volume of produce sales. 





EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 449 


ble 2. If the time interval stratification is ignored, the average error 
variance in the 24 experiments in 1950 is 51 per cent larger; the median 
increase in efficiency of the latin square over the randomized complete 
block is 22.5 per cent. If the store stratification is ignored but the time 
period grouping is not, the average increase in efficiency of the latin 
square is 49 per cent, while the median increase is 42 per cent (column 
6, Table 2). If both the store and time interval variation are not con- 
trolled the average increase in the error mean square for the completely 
randomized design in these same 24 experiments is 77 percent, and the 
median increase is 62 percent. The other experiments were not included 
in ese averages because the period of observation was of different 
length. 

The analysis of covariance of apple sales and total number of cus- 
tomers, total grocery sales, or total produce and grocery sales was of 
limited usefulness in these studies. The removal of store and time in- 
terval differences in the latin square accounted for most of the rela- 
tionship between the covariate and volume of apple sales. The residual 
variations were not related to any extent. If the variation due to stores 
and time intervals were not removed then covariance analyses may be 
expected to decrease the error variance considerably, but not to the 
extent that the latin square did. In other studies the use of covariance 
analyses may prove quite beneficial. 


A Sampling Program 


Having affected material improvement in the merchandising of ap- 
ples and having ascertained some of the important factors affecting 
their sales the industry was anxious to use this information to achieve 
an orderly movement of the crop into consumption. Experience from 
the previous work indicated that observations of sales coupled with 
customer counts might serve as an indicator of movement rate from 
week to week. Rate of movement together with descriptions of store 
practices would enable the industry to undertake remedial action as 
soon as undesirable developments occurred. Over a large number of 
stores the movement rate could be affected by a number of factors chief 
among which are: (1) merchandising practice used, (2) proportion of 
stores handling apples, (3) relative display space devoted to apples, (4) 
prices of apples and other fruits, and (5) quality condition of apples and 
competing fruits. 

Even though previous experience had revealed a high degree of con- 
sistency in the customer reactions to different selling practices among 
different stores, there still remained a tremendous problem of how to 





450 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


efficiently sample stores over a wide geographic area. Published lists 
of stores were available for most towns and cities in the western half 
of New York State which was chosen for study but it was desirable to 
know something of the effects of the geographic area, size of city, size 
of store, day of week and time of day on the rates of sale. To insure the 
measurement of all these variables with a relatively restricted budget 
some form of experimental design seemed to offer definite advantages. 
The first purpose of the study was to learn more about the influence of 
the above factors on rate of sale. The second purpose was to provide 
a crude measure of movement rate from week to week for release to the 
trade until a more adequate coverage could be obtained. At the outset 
it was decided that the second purpose should be subservient to the 
first. 

Since the unit of observation in this study was the customer in the 
store, the question might arise as to why people were not interviewed 
in their homes or why rate of movement information was not obtained 
from weekly store inventories. Direct observation of actual customer 
performance has many advantages in avoiding memory biases, in en- 
abling enumerators to cover much larger numbers of shoppers, and in 
associating specific merchandising practices with shopper performance. 
Assuming that accurate store inventories could be obtained (and there 
is good reason to doubt it) there would still remain the problem of de- 
termining how the product was merchandised as well as shopper re- 
sponse to such practice. 

Because the sales rate on weekends varied cons‘derably from the first 
parts of the week, it was decided to make one visit to each selected 
store in each part of the week and during each visit take customer 
counts and sales for a one hour period. The budget limited such cover- 
age to 64 stores. The area selected was Western New York which was 
divided into four geographic areas. In each of these areas 4 sizes of cities 
were selected. Lists were prepared of all places over 100,000 population, 
20,000—100,000, 5,000-20,000 and under 5,000. 

It so happened that there was only one city in each area having over 
100,000 population so these were automatically selected. Random se- 
lections were then made of one city in each area from the second size 
grouping, 2 cities from the third and 4 cities in each area from those un- 
der 5,000 population. Many small cities are clustered around the larger 
ones with the shopping areas for the smaller places being in the larger 
cities. For this reason it was necessary to impose a restriction that any 
smaller city selected be at least 10 miles from a larger city. Routes were 
then constructed for each area with 4 stores in each of the two larger 





, 1953 


lists 
half 
e to 
size 
the 
Iget 
ges. 
e of 
vide 
the 
tset 
the 


the 
wed 
ned 
mer 
en- 
1 in 
nce. 
rere 
de- 
re- 


first 
‘ted 
mer 
ver- 
was 
ties 
ion, 


ver 

se- 
31ze 
un- 
ger 
ger 
ny 
ere 
ger 


EXPERIMENTAL DESIGNS AND PROBABILITY SAMPLING 451 


sized cities, 2 each in cities of 5,000 to 20,000 and 1 each in towns under 
5,000 population. Lists in each area were used to select stores, half be- 
ing small and half large stores. The plan was so constructed that visits 
to any one store were made in succeeding weeks at precisely the same 
hour and day of week. Within any one two-hour period throughout the 
week one store was enumerated in each of the 4 geographic areas, in 
each of the 4 sizes of cities and half the stores were small and half were 
large. 

All combinations of the variables—geographic area, city size, day of 
week, time of day and size of store—constitute a 24‘ factorial. The 
possible combinations total 512. From these combinations the 64 given 
in Figure 2 were selected. The fractional replicate selected was con- 








Level of factor 





b 
0 
0 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 
1 
1 
1 
1 


oorr rR re OCC Cor HK KS KS CSC R 
COCwWNNF KR COOCWWONNFKK OCS 
SCONNWOWRrR RK WWF ke NNO O RR 
Or We We WK NONONONOS 
coor rR KR eS CO CCCCO RF RK eK eK OO 8 
WCWWWOWWWWNnNNNNNNN DN O&O 
CHWwWONNFK KR OCOOCOWOWNNKK OO 8 
NNOCOF KF WWreE WWOONN RA 
Or We We WM NOONONONOC 8 
=e Oooo r Ke KF OCC COFFK 8 
eS ee er eH OOCCOCCOCO Se 
OWwWNNFK RK COWWNNKF RK OOS 
SCONNWWHE KF WWE He NONOOR 
NONONONOWrH WH WHE We O 
re COocoocorr rw kK COCO Cr rK BR 
WWwWWWWWWWNnNNNNNNN ND & 
WOwWNNF KR OOWWNNK KF OOS 
NONOCOCOF RK WWOFrF KF WWOOONN 
NONONONCWOr Wer We Wea 





ae =large store d,s =Monday 

a: =small store d, = Tuesday 
d: = Wednesday 
d, = Thursday 


For 
First Part 
of Week 


bs =cities over 100,000 

b: =cities between 20,000 and 100,000 

b: =cities between 5,000 and 20,000 éo= 8 a.m. to 10 a.m. 

bs =cities under 5,000 é:=10 a.m. to noon 
é:=noon to 2 P.M. 
é:=2 P.M. to 4 P.M. 


For 
First Part 
of Week 


ce = Buffalo area 

¢c: = Binghamton area 
c: =Syracuse area 

cs =Rochester area 


Fiaeure 2. Sixty-four Treatments Used in Studying Rate of Movement. 





452 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


structed so as not to confound main effects. The time periods within 
each week were divided to permit two complete sets of the 64 combina- 
tions—one set during slack trading hours and one during heavy trading 
hours. Thus each store in the design was enumerated twice a week and 
at precisely the same hours in succeeding weeks. 

Weekly enumerations were completed each Saturday night at 6 
o’clock. Office tabulations were made daily as the field reports were re- 
ceived so that summaries were completed by Monday noon for each 
preceding week. These completed reports on movement rate were 
mailed to the trade by Monday evening. The greatest delay in tabula- 
tion resulted from making adjustments in the non-proportional sam- 
pling which was necessitated by the experimental design. The sum- 
maries reported the rate of sale per 100 customers, quality indices and 
retail prices of each variety, size of pricing unit, a description of display 
practices as well as qualities, prices and display space of other fruit. 
Experience has shown that these factors are associated with rate of 
movement and the information proved useful to the trade in taking 
correct remedial action in maintaining the movement of apples into 
consumption consistent with storage inventories. 

Combining probability sampling with an experimental design in this 
instance served to evaluate certain variables for use in the designing of 


an improved sample for future use and at the same time permitted some 
degree of estimate of the current movement situation together with its 
associated causes. 





DR 1953 
vithin 
\bina- 
ading 
k and 


at 6 
re re- 
each 
were 
bula- 
sam- 
sum- 
; and 
play 
ruit. 
te of 
king 
into 


this 
ig of 
ome 
1 its 


IMPROVING NATIONAL MARRIAGE AND 
DIVORCE STATISTICS* 


Hua CarTER 
National Office of Vital Statistics 


HE principal objectives of the program for improving our present 
marriage and divorce statistics are to provide prompt and compre- 
hensive data on marriages and divorces that occur in the United 
States and to give such details regarding the social characteristics of 
the persons involved as are needed by users of these statistics. The rate 
of formation of new families, as well as the rate of dissolution of estab- 
lished families is of interest to sociologists, economists, demographers, 
social workers, and many other professional and business groups. 
Statisticians concerned with population projections have recently 
shown an increased interest in the role of marriage data as an aid in 
forecasting births. At present, international comparisons of marriage 
statistics emphasize the incompleteness of the United States figures. 
Distribution of the population by marital status is given in Bureau 
of the Census data; for 1950 the figures are available with considerable 
detail as to social characteristics. By contrast, the registration of 
marriages, or divorces, for 1950 provides a count of occurrences within 
the year and information concerning the social characteristics of the 
individuals at the time of registration. Since registration statistics are 
based upon legal documents, certain types of closely related events, 
such as consensual or common law marriages, are not included in 
these periodic counts. Final decrees of absolute divorce are tabulated 
and exclude limited decrees and separations. The present paper will 
review the steps now being taken to improve national marriage and 
divorce statistics and to indicate some of the problems involved. As 
background it will summarize the earlier efforts of the Federal Govern- 
ment in this field. Registrations occur in local communities, typically 
in the community that is the county seat. In a majority of the States, 
a record of marriage or divorce is transmitted to the State Registrar of 
Vital Statistics. 

Improvement of marriage and divorce statistics can take place only 
on the basis of close cooperation between the federal, State and local 
agencies involved. Fortunately, such cooperation is already well ad- 
vanced. State registrars of vital statistics are accustomed to close 
cooperation with local officials. The pattern of this cooperation has 
been hammered out over the years through other programs, such as 





* Presented to the American Statistical Association Meeting in Chicago, December 30, 1952. 


453 





454 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


registration of births and deaths. In most States, the State registrar 
maintains a field staff to work closely with local officials. 

Cooperation between federal and State agencies is greatly facilitated 
by the Public Health Conference on Records and Statistics, hereafter 
referred to as the Conference. This organization was created to provide 
for interchange and discussion of ideas and problems relating to 
public health statistical programs and to encourage coooperative 
action by the representatives of federal, State, and other organizations 
included in the membership. Represented at the Conference are indi- 
viduals concerned with registration and health statistics activities of 
each State, Territory, and independent registration area of the United 
States. Also part of the Conference are representatives of the American 
Association of Registration Executives, the American Public Health 
Association, and the National Office of Vital Statistics of the Public 
Health Service. The Working Group on Marriage and Divorce Regis- 
tration of the Conference has for some years been preparing a compre- 
hensive federal-State program of marriage and divorce registration 
and statistics. 

Before proceeding with the discussion of plans for improving mar- 
riage and divorce statistics, it may be useful to glance at the history of 
the registration and reporting of marriages and divorces and to note 
what data are presently available on a yearly or monthly basis. It 
will be evident from this survey that the past century has witnessed 
substantial improvement in the reporting of these data. While there 
have been many serious set-backs to the program, and while much 
ground remains to be covered, the trend has been toward greater 
completeness and comprehensiveness of reporting. 

Statistics on marriages for the United States were collected in the 
Decennial Census of 1850 and in several subsequent censuses, with 
admittedly “very deficient” results.1? Marriages and divorces during 
the period 1867-1906 were compiled in two surveys based on the 
original records in county seats.*4 During the next 15 years, except for 
1916, no national statistics on marriages and divorces were collected; 
but beginning with data for 1922, the Bureau of the Census undertook 
an annual collection program which continued for 11 years.’ In 1928 
it published estimates for the missing years 1907-15 and 1917-21. 





1 Population of the United States in 1860, Census Office, 8th Census, 1864, p. XXXVI. 

2 The Statistics of the Population of the United States, 1870, Vol. I, Census Office, 9th Census, 1872, 
p. XXIX. 

3 Marriage and Divorce in the United States, 1867 to 1886, by Carroll D. Wright, Commissioner of 
Labor, 1889 (out of print). F 

4 Marriages and Divorces, 1867-1906, Bureau of the Census, 1908 (out of print). 

5 Marriage and Divorce, Annual Reports, 1922-82, Bureau of the Census, 1925-34, 

6 Tbid., 1926. 





IMPROVING MARRIAGE AND DIVORCE STATISTICS 455 


For the years 1933-36, the best available national estimates are those 
of Stouffer and Spencer.’ 

During the 1940 Census period, the Bureau instituted marriage and 
divorce collection programs patterned after those already operating in 
the collection of data on births and deaths. This program was short- 
lived but helped produce enough data for estimates to be made for the 
years 1937 through 1940.8 A program based on data collected by mail 
from a variety of sources was begun several years later,® and in July 
1946 the function was transferred to the Public Health Service as an 
integral part of the National Office of Vital Statistics. 

Annual summaries of marriage and divorce statistics for the United 
States, by State, have been published for the years 1946 through 1950. 
For a substantial number of States it has been necessary to use figures 
for “marriage licenses” rather than “marriages,” and for a few States, 
where reporting was incomplete, estimates have been made. The figures 
are tabulated by the State in which the marriage or divorce occurred. 

Monthly national and State figures on marriages (licenses or mar- 
riages reported)!° are obtained from 25 State offices and from local 
officials in 23 States. Other monthly figures include marriage licenses 
for each of the major cities and divorces and annulments for 19 States. 

For the specified States that can provide the data, the National 
Office of Vital Statistics also publishes detailed reports on marriage 
and divorce. This cooperative project does not include all of the States. 
The marriage report" gives ages of bride and groom by first marriage 
and remarriage, race, and residence or nonresidence in State of occur- 
rence. The report on detailed statistics of divorce” includes tables on 
legal grounds for divorce, party to whom the decree was granted, 
duration of marriage, and numbers of children reported. Both of 
these reports contain a number of tables with detailed cross tabulations 
of the data. From time to time special studies are published, the most 
recent being an analysis of seasonality in marriage licenses. Since 
1946, statistics on marriages and divorces in the United States have 





7 “Recent Increases in Marriage and Divorce,” American Journal of Sociology, January, 1939, 
551-54. 

8 Estimated Number of Marriages by State: United States, 1937-40, Bureau of the Census, 1942; 
and Estimated Number of Divorces by State: United States, 1987-40, Bureau of the Census, 1942. 

® Marriage and Divorce in the United States, 1987 to 1945, National Office of Vital Statistics, Vital 
Statistics—Special Reports, Vol. 23, No. 9, 1946. 

10 See “Monthly Vital Statistics Report” Vol. I, 1952 and Vol. II, 1953. 

1 Statistics on Marriages: Specified States, 1950, National Office of Vital Statistics, Vital Statistics— 
Special Reports, Vol. 37, No. 5, 1952. 

12 Statistics on Divorces and Annulments: Specified States, 1950, National Office of Vital Statistics, 
Vital Statistics—Special Reports, Vol. 37, No. 4, 1952. 

BS 1 Variations in Marriage Licenses, National Office of Vital Statistics, Vital Statistics— 
Special Reports, Selected Studies, Vol. 33, No. 12, 1952. 








456 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


also been included in the annual volumes of the National Office of 
Vital Statistics.“ 

Turning specifically to questions of improving national marriage 
and divorce statistics it seems clear that to provide comparability for 
figures prepared under many independent jurisdictions located in every 
part of the country there must be careful agreement regarding standard 
procedures, report forms, and definiticn of important terms. While 
many individual States have excellent statistical programs, consider- 
able difficulty is encountered in bringing these together into meaningful 
national statistics because of the variation in the State report forms. 
Thus, on the majority of State marriage certificates there is an item 
concerned with “number of marriages.” This is variously worded 
“number of previous marriages,” “number of marriages,” “number of 
proposed marriage,” and “any prior marriage.” About one-third of the 
State certificates do not contain this item. Similar variations may be 
noted in questions on “occupation,” “birthplace,” and several other 
items. There is nothing surprising about these variations in wording 
except that they are not more extensive. 

Progress is being made toward the necessary marriage and divorce 
standard certificates (or statistical report forms). The Working Group 
on Marriage and Divorce Registration of the Conference, which in- 
cludes some of the leading State registrars as well as representatives 
of important users of marriage and divorce statistics, has prepared 
suggested minimum lists of items to be included on the standard 
certificates. In order to have available a comprehensive picture of 
the needs of consumers and producers of marriage and divorce statis- 
tics, questionnaires listing the minimum items suggested by the Work- 
ing Group and a few frequently proposed additions, were distributed to 
a representative list of users of these statistics and to all the State 
registrars of vital statistics. 

The response was excellent. From the registrars, or their statisticians, 
responses were received from all but one of the registration areas, and 
so it is possible to tabulate with some confidence the opinions of the 
persons who will have primary responsibility for carrying out the regis- 
tration and statistical program. The response to the questionnaires 
was also good from consumers of marriage and divorce statistics with 
45 per cent of these questionnaires being returned. They went to sociol- 
ogists especially interested in marriage and the family, to demographers 
who are members of the Population Association of America and believed 





14 See “Vital Statistics of the United States,” Part I, National Office of Vital Statistics, Public 
Health Service, Department of Health, Education, and Welfare. Government Printing Office, 1948-51. 





IMPROVING MARRIAGE AND DIVORCE STATISTICS 457 


to be interested in these problems, and to representatives of a few 
private agencies and federal government agencies who are thought to 
have a special interest. We call these the consumer group, although the 
list is far from complete, an important omission being the consumers 
in business organizations. There will be other samplings of the opinions 
of consumers of these statistics, and we shall welcome suggestions of 
persons or groups that should be queried. 

On most of the items there was a large measure of agreement among 
the respondents. This is natural since there is a clear need for such 
items of identification as name, place of residence, and age. Users of 
these statistics frequently request data, now unavailable, on the 
social characteristics of the persons involved. Disagreement is noted 
when proposed questions are not clearly needed for identification pur- 
poses. Two possible items will serve to illustrate the problems of 
preparing good report forms in this field: “occupation and industry” 
and “last grade of school completed.” On both the marriage and 
divorce questionnaires these items were listed and respondents were 
invited to comment, as well as to check “yes or no,” whether each item 
should be included in the standard certificates. 

The item “occupation and industry,” was discussed in the Working 
Group, and there was some support for it, though not majority sup- 
port. Moreover, several States have this item on existing forms. Re- 
sults of the questionnaires reflected divided sentiment in the registrar 
group: of those expressing a positive opinion, a small majority favored 
its inclusion on the marriage certificate, while sentiment was almost 
evenly divided regarding its inclusion on the divorce certificate. In the 
consumer group there was a strong majority for including the item 
“occupation and industry” on both certificates. On the other hand, for 
the suggested question “last grade of school completed” the registrar 
group was opposed to its inclusion by a large majority, while among the 
consumer group an even larger majority favored its inclusion. 

There are many sides to the question of whether a given item should 
be included on a standard certificate. One asks first who needs this 
information and for what purpose; obviously, information is not 
gathered to satisfy idle curiosity. Moreover, the ease and accuracy 
with which information can be recorded and tabulated are important 
since thousands of local officials in all parts of the country must record 
it. A number of experienced statisticians have pointed to the practical 
difficulties the “occupation and industry” item will raise. Perhaps these 
can be overcome as each State registrar gives detailed instructions to 
local officials in his State. On the other hand, a question concerning 





458 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


“last grade of school completed” would seem to offer slight possibility 
of being misunderstood and would be easy to tabulate. 

One must also ask whether the persons required to complete the 
forms will object to certain items. Will persons applying for a marriage 
license resent being asked to state the last grade of school completed, 
or to give their present occupation and industry? In the final analysis 
these and similar questions can only be answered by actual field tests. 

In response to a request accompanying the questionnaires, a large 
number of suggestions of new items to be included on the report forms 
were received. Consideration will be given first to the marriage ques- 
tionnaire. In the consumer group, nearly one-third of the suggestions 
asked for religious preference of the persons to be married. Other 
suggestions, in order of frequency, concerned physical qualifications 
for marriage such as results of health examinations, facts concerning 
children by previous marriage, details regarding the marital records 
such as date of first marriage, economic status such as income, various 
facts concerning parents, and other items. The registrar group men- 
tioned most frequently the desirability of more facts about the parents, 
such as name and birthplace. This suggestion was followed in frequency 
by items concerned with legal status and identification, religion, the 
marital record, items useful for follow-up activity to complete the 
records, and other items. 

The suggested additional items for the divorce form had many simi- 
Jarities to the list for the marriage form. The consumer group stressed 
religious preference, more than one-third of the total falling here. 
Other suggestions, in order of frequency, concerned the divorce action, 
such as facts about alimony, the children affected, the marital record, 
economic status, data regarding parents, birthplace, education, and 
other items. The registrar group suggested additional facts on the 
divorce action, regarding the children, religious preference, and other 
items. 

Since the forms must be reasonably brief and easy to complete and 
tabulate, it is clear that some difficult decisions must be made regard- 
ing items to be excluded. There are important differences in the sug- 
gestions received from the consumers and the producers of these sta- 
tistics. This was to be expected. In general, the State registrars, 
having special knowledge of local officials’ problems, place greater 
emphasis on items essential to identification and on the desirability 
of keeping the forms brief. The consumers, not concerned with local 
limitations, ask for items that will make possible the detailed break- 
down of the gross figures that will make them more meaningful. 





R 1953 
vility 


» the 
riage 
eted, 
lysis 
ests. 
large 
orms 
jues- 
Lions 
ther 
‘lons 
ning 
ords 
‘ious 
nen- 
nts, 
ancy 

the 

the 


imi- 
ssed 
ere, 
ion, 
ord, 
and 

the 
ther 


and 
ard- 
sug- 
sta- 
ars, 
ater 
lity 
real 
ak- 


IMPROVING MARRIAGE AND DIVORCE STATISTICS 459 


Results of the questionnaires will also be examined by the Working 
Group on Marriage and Divorce Registration, and the best possible 
compromise will be sought between the points of view of the producers 
and the consumers. Preliminary forms will be prepared and studied 
before issuing the standard certificates. It is not assumed that every 
State will use identical forms; it is hoped that every State form will 
contain all items considered essent' .. Probably many States will add 
items to the standard list, or will retain the additional items now con- 
tained on their forms. 

Another major phase of the program concerns the plan to establish 
Registration Areas for marriage and divorce comparable to the long- 
familiar Birth and Death Registration Areas. The Registration Areas 
are the suggestion of the Working Group and grow out of the experi- 
ence of State registrars with improving other vital statistics. Many of 
the State registrars are very optimistic regarding the rate of growth of 
the Marriage and Divorce Registration Areas once they are established. 
It is planned to publish statistics from Registration Areas in greater 
detail than will be possible for the remaining States. Experience during 
the development of the Birth and Death Registration Areas indicates 
that the States desire to be members of a Registration Area because 
of the wide use made of the statistical reports for such areas. 

The original States that will make up the Marriage and Divorce 
Registration Areas will necessarily be limited to States with central 
files of marriage and divorce records. The accompanying table indi- 
cates which States, at this time, maintain central files of marriage and 
divorce records. 

Three fundamental questions may be raised concerning future de- 
velopments of this program: 

First—How shall consistency checks be made of periodic reports 
of marriages and divorces? Since a legal marriage or divorce inevitably 
requires formal registration, it may appear that there is no urgent 
need for consistency checks. Recording at the local level, of course, 
may not result in recording at State or federal levels; and consistency 
checks, with emphasis upon routine procedures and record-keeping, 
are essential. 

As a first step in this direction the National Office of Vital Statistics, 
in cooperation with the Working Group is preparing a procedural 
manual for use by State registrars. This will set forth the necessary 
steps for registration, follow-up by letter or field agent, record process- 
ing, and all aspects of the work needed to produce comprehensive 
national marriage and divorce statistics. Obviously, this manual must 





460 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


CENTRAL FILES OF MARRIAGE AND DIVORCE RECORDS 


(States, Independent Registration Areas, and Territories indicating 
those maintaining Central Files of Marriage and Divorce Records) 








Areas 


Marriage Divorce 


Marriage Divorce 


Areas Records Records 





Alabama 
Arizona 
Arkansas 
California 
Colorado 
Connecticut 
Delaware 
Dist. of Columbia 
Florida 
Georgia 
Idaho 
Illinois 
Indiana 
Iowa 
Kansas 
Kentucky 
Louisiana 
Maine 
Maryland 
Massachusetts 
Michigan 
Minnesota 
Mississippi 
Missouri 


Montana 
Nebraska 
Nevada 

New Hampshire 
New Jersey 
New Mexico 
New York 
North Carolina 
North Dakota 
Ohio 
Oklahoma 
Oregon 
Pennsylvania 
Rhode Island 
South Carolina 
South Dakota 
Tennessee 
Texas 

Utah 

Vermont 
Virginia 
Washington 
West Virginia 
Wisconsin 
Wyoming 





Independent 
Registration 
Areas 


Territories 





New Orleans 
New York City 


Alaska 
Hawaii 
Puerto Rico 
Virgin Islands 





(a) Law for the central filing of divorce records recently enacted. 

Whether all of these States can be counted upon to enter the original Registration Areas will de- 
pend upon their ability to meet certain essential requirements. It will be necessary, of course, that 
a State maintain a current file of marriage and divorce records, reasonably complete, and that the State 
agree to provide statistical data to be used in preparing the national summaries. 


represent the pooling of experience of many registration officials. 
Responsibility for carrying out periodic checks must rest with the 
State registrars since they have responsibility for the completeness 





IMPROVING MARRIAGE AND DIVORCE STATISTICS 461 


and comprehensiveness of the vital statistics in the various States. 
In this important work the National Office of Vital Statistics will be 
glad to cooperate to the limit of its resources. In due time a body of 
useful knowledge that can be applied generally will be developed. 

Second.—How shall national marriage and divorce statistics be 
compiled? In recent years pretabulated data supplied by the State 
registrars have been the principal source of national statistics. The 
number of States that can provide the necessary detailed tables has 
been increasing and there is every reason to believe that this trend will 
continue. Pretabulation of data by the States is the simplest method 
and the least expensive. At the same time the chief burden is thrown 
on the State offices, which in some cases may not be in a position to 
carry it. It is also much more difficult to get consistency through pre- 
tabulated data and there is a certain inflexibility in this method. It 
makes impracticable any additional study of the basic records once the 
routine tabulations have been completed as this would require that every 
State prepare new tabulations. There are no present plans to modify 
this procedure. 

There is a second method by which national statistics could be 
compiled. Microfilm copies of marriage and divorce certificates could 
be purchased from each State and the tabulations made in the National 
Office. This method is familiar to State registrars who use it for birth 
and death certificates. Punch cards are prepared in the National Office. 
Punch cards of births in Illinois are prepared by that States’ Bureau 
of Statistics, for use by the National Office. Several other States are 
considering the possibilities of such a cooperative program with the 
States furnishing punch cards of births to the National Office. 

Third.—To what extent can sampling methods be used in preparing 
national marriage and divorce statistics? One can answer this question 
only on the basis of actual field tests. In may appear that there is no 
need for sampling since every effort is being made to secure registration 
of all marriages and divorces. However, even the most comprehensive 
report forms will leave many questions unanswered. Study of the 
suggested items written on the questionnaires makes this clear. The 
decennial population censuses make increasing use of sampling, and our 
situation is analogous. Some of the most important social characteris- 
tics can only be obtained on a sampling basis. Plans have been drawn 
to use the 25,000 households of the Current Population Survey of the 
Bureau of the Census. It may be possible, also, through sampling 
methods, to provide a check on the totals obtained by registration. It 
should be emphasized that the practical difficulties to be encountered 
can be determined only by field tests. 





SAMPLING THE FEDERAL OLD-AGE AND 
SURVIVORS INSURANCE RECORDS* 


B. J. MANDEL 
Bureau of Old-Age and Survivors Insurance 


INTRODUCTION 


T THE end of 1952, records were available for about 100 million 
A accounts with some wage credits under the old-age and survivors 
insurance program since January 1937; for more than ten million em- 
ploying organizations which had reported wages under the program; 
and for over five million individuals who were receiving either retire- 
ment or survivors benefits under the program. It is apparent from 
this quantitative picture of the vastness of these records, that only a 
sound and flexible system of sampling could tap the information con- 
tained in them without undue expense and delay. The purpose of this 
paper is to describe the sampling systems used for tabulating statistics 
from the old-age and survivors insurance records and the associated 
methods and problems of estimation. 

Some statistics become available on a 100-per cent basis, without 
extra cost, because they are part of the controls in the accounting op- 
erations. Thus, it is learned from this source that 3.6 million employing 
organizations made wage reports under the program for the first 
quarter of 1952; that some 53 million wage items were listed on these 
reports, with aggregate taxable wages amounting to $33 billion for the 
quarter; that 1.1 million employee account numbers were issued in the 
third quarter of 1952 and that 150,000 new employer identification 
numbers were assigned in that quarter. 

However, while these easily-obtained accounting totals furnish 
useful knowledge on the Bureau’s over-all workloads, none provides 
an insight into the operations of the program provisions, or indicates 
the wage and employment experience or other characteristics of the 
contributing workers, the size and industrial activity of employers, or 
the characteristics of beneficiaries under the program and the different 
amounts of their benefits. This knowledge of the operations of the 
program can be acquired only by studying the distributions of workers, 
beneficiaries and employers by the classifications contained in the basic 





* An adaptation of a paper on “Sampling the Social Security Records of the United States of 
America” submitted to the United Nations Statistical Commission, Sub-Commission om Statistical 
Sampling for the Fifth session at Calcutta, India, December 19, 1951. The writer wishes to thank Irwin 
Wolkstein of the Bureau of Old-Age and Survivors Insurance for critically reading the paper and making 
valuable comments. 


462 





lion 
ivors 

em- 
ram; 
tire- 
from 
ly a 
con- 

this 
stics 
ated 


hout 
; op- 
ying 
first 
hese 
the 
the 
tion 


nish 
ides 
ates 


FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 463 


records, such as age and insurance status of workers, amount of monthly 
benefits by State of residence of the beneficiary and industrial activity 
and size of business organization. As a rule, such detailed data are not 
essential for the accounting and benefit paying processes. Not being 
an integral part of these operations, therefore, the data can be obtained 
only by independent tabulations either of the entire universes or sam- 
ples drawn from them. 


THE UNIVERSES AND NEEDED FACTS 


To appraise the effectiveness of the social security provisions in 
providing economic security, to administer various aspects of the 
program, to formulate policy and legislation, and to forecast income to 
and outgo from the insurance fund, accurate and current information is 
needed on a variety of subjects. Vitally important is information on 
the number and characteristics of persons covered and insured under 
the program, as well as on the number and characteristics of persons 
and families receiving benefits. These two types of statistics are repre- 
sented by two distinct universes, namely, employees with wage credits 
and persons in receipt of benefits. Facts gleaned from the basic files 
for these individuals could shed light on such questions as: How many 
persons are contributing under the program and what is the amount 
of their contributions? How many have worked in covered jobs suffi- 
ciently long to be insured, and how much is their average monthly 
wage or potential benefit amount? How many insured persons are 
approaching the retirement age of 65? How many families are already 
in receipt of benefits and how much are they receiving? 

Also of importance, particularly in administrative planning, is infor- 
mation on the number and characteristics of employing organizations 
reporting under the program, the third universe for which statistics 
are tabulated. These tabulations could answer such questions as: How 
many new businesses are started in a month or a quarter? How many 
are discontinued? How many are currently operating and what are 
their characteristics, such as their size, industrial activity, location, 
aggregate employment and payrolls? 

Thus, altogether, statistics are needed about the size and charac- 
teristics of three different universes commonly known as employees, 
beneficiaries, and employers, each of which contains millions of indi- 
vidual items. Tabulations from one of the three universes are currently 
being made on a 100-per cent basis, namely, the file of new employer 
numbers issued and the file of currently reporting employers. (The file 
of business deaths is sampled on a fifty-per cent basis.) In addition, a 





464 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


special annual tabulation of beneficiaries by county of residence is 
made on a 100-per cent basis. The data from these 100-per cent tabu- 
lations are used widely as bench marks or censuses both in the Social 
Security Administration and in other government agencies. The costs 
of these tabulations have not been large in the light of the required 
detail and accuracy. Furthermore, the requirements could not have 
been met more economically through sampling. 

The remaining two universes, namely, workers and beneficiaries 
(with the above exception) are sampled both periodically and inter- 
mittently for special studies. 


SPECIFIC SAMPLING SYSTEMS AND SIZES 


The system of identification and controls.'\—A nine digit account num- 
ber is used to identify all persons who receive wage credits under the 
program, including those on whose account benefits are paid. Employ- 
ers making reports are also identified by a nine-digit number. In both 
cases, the number is issued generally at the time of first coverage under 
the program, and it serves as a basis for identifying the employer so 
long as he remains in business and the employee throughout his working 
lifetime, including the period of benefit payments to him or his depend- 
ents. The following is the composition of these numbers: 


Employee Account Number 


000 


Three digits representing 
the geographical area of 
issuance. There are 612 
area numbers in use at 
present. 


00 


Two digits representing area of issu- 
ance. There are 68 such areas desig- 
nated as Collector of Internal Revenue 


districts. 


00 
Two digits representing a 
group or sequence of num- 
bers. One hundred groups 
of numbers—00 to 99— 
can be issued in any area. 


Employer Account Number 


0000 

Four digits representing 
the serial number. 10,000 
numbers can be issued for 
each group in any one 
area. Therefore, one mil- 
lion numbers can be is- 
sued in a single area. 


0000000 


Seven digits representing the serial. 
Therefore, 10 million numbers can be 
issued in any one district. 


When the Social Security Administration adopted a numbering sys- 


tem to identify employees and employers, it also set up procedures 
for the issuance of employee numbers with predetermined three digits, 





1 For a fuller description, see “Technical Problems Involved in the Administration of Social 
Security Schemes—Manual of Methods of Identification of Insured Persons and Organization of 
Records’’—National Monograph Number I, pages 209-216, published by the United Nations. 





FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 465 


and two digits in the case of employers, in specific areas. Thus, for 
example, all social security account numbers with 318 through 361 as 
the first three digits were issued to individuals in the State of Illinois 
and all employer identification numbers with 52 as the first two digits 
were issued to employers in Maryland.'It also set up procedures for 
avoiding the issuance of more than one number to an individual or 
employer, by a screening process, and for issuing numbers in strict 
numerical sequence [1]. 

The over 500 field offices of the Bureau of Old-Age and Survivors 
Insurance throughout the country, including Alaska, Hawaii, Puerto 
Rico and Virgin Islands issue account numbers to individuals as they 
apply for them. Controls over the specific numbers to be issued and the 
methods of issuance are set up and maintained by the Bureau’s central 
office in Baltimore. At present the account numbers are released to the 
field offices in multiples of 500 of which, for reasons explained later. 
20 per cent contain either a “2” or “7” in the first place of the serial. 
Thus, if a field office is assigned 500 numbers to issue, 100 numbers are 
in the “2” or “7” series; if an office is assigned 5,000 numbers, 1,000 
numbers contain either a “2” or “7” in the first place of the serial. The 
field offices must issue numbers consecutively, starting with the lowest 
number of the series assigned to them. Prior to October 1940, blocks of 
numbers of unspecified sizes arranged in numeric sequence were re- 
leased to field offices. Furthermore, some of the large employing organi- 
zations were given whole clusters of numbers to issue directly to their 
employees, so as to relieve the initial registration workload on the 
administration. Because clusters of consecutive numbers were issued 
to groups of people in the same employing organization, some serial 
correlation was introduced. However, it is logical to assume that the 
effects of this serial correlation have been substantially reduced over 
the past fifteen years, because of interemployer and interindustry shifts 
of employees, deaths and retirements [2]. Of course, serial correlation 
is introduced even without employer issues, because of the issuance of 
numbers in numerical sequence. This correlation also diminishes with 
time. 

Method of sampling and sample sizes.—-The system of sampling the 
social security records for employees and beneficiaries is based on the 
last four digits (the serial) in the account number and is geared to the 
administrative operations, so as to derive statistics wherever possible 
as a by-product of the accounting work. Consequently, sample selection 
for obtaining data on employees is generally restricted to a sub-universe 
of 20 per cent of the accounts to which wages are posted for a full 





466 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


calendar year at one time. The Bureau of Old-Age and Survivors In- 
surance split up its entire file of accounts into four parts, so that the 
job of posting wages to the employee’s credit could be spread out uni- 
formly over the twelve months of the year, instead of accumulating a 
peak posting load once each year and experiencing slacks during 
other parts of the year. The posting group which could provide 
calendar-year data as a by-product of the accounting operations was 
chosen as the sub-universe for sampling. The remaining three groups 
included wages for four calendar quarters which spanned two consecu- 
tive calendar years. Since most economic analyses are based on a 
calendar year rather than on any other twelve-month period, this group 
was preferred over the others that were available. 

By virtue of previous planning, this sub-universe consists of all 
accounts having “2” or “7” as the first digit of the serial, or a 20-per 
cent sample. Selection was made on the basis of the first digit in the 
serial to economize on the sorting operations. Since all accounts are 
filed in numerical sequence for accounting purposes, blocks of 1,000 
accounts could be withdrawn at a single time for statistical tabulations 
without additional sorting, by choosing a digit in the first place of the 
serial. It should be noted that this procedure yields a sub-universe 
for sampling which is composed of clusters of 1,000 numbers each, 
which, under current procedures, are made up of smaller clusters of 
100; 200; 300 and so forth up to a maximum of 1,000 numbers issued 
consecutively. At this time, the sub-universe of accounts with wage 
credits includes over twenty million accounts and is too large for 
tabulation of data on employees. Consequently, samples of different 
sizes are drawn from it to provide data of different kinds. 

The largest sample for employee data is known as the One-Per Cent 
Continuous Work History Sample [3], presently comprising about one 
million accounts nationally. This sample provides information on the 
wage and employment characteristics and other classifications of the 
100 million accounts over the working life of the individual beginning 
with 1937. Another sample, the One-Per Cent Annual Employee 
Sample, which is part of the Continuous Work History Sample, con- 
tains about 500,000 to 600,000 accounts nationally and furnishes 
information on the industrial distributions, earnings, age and other 
wage and employment characteristics of the workers in the latest year. 
A third sample, which is a sub-sample of the work history sample 
known as the One-Tenth Per Cent Advance Sample, includes about 
100,000 accounts nationally and provides selected quarterly, annual 
as well as work history data on employees under the program needed 





FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 467 


for current and long-range estimating. On some special occasions use 
is made of a national 0.02-per cent sample comprising about 20,000 
accounts.” 

All of these varying-sized samples were selected from the 20-per 
cent sub-universe, and smaller-sized samples were selected a* internal 
segments of the larger-sized samples, so as to economize as much as 
possible on sample selection and maintenance. Thus, a four-per cent 
sample used some ten years ago for tabulating 1937-40 work history 
data was a 20-per cent sample of the 20-per cent sub-universe, compris- 
ing accounts with the digits “O” or “5” in the last column of the serial. 
This method included two account numbers out of every 10 in the sub- 
universe, and provided a systematic 20 per cent sample from the sub- 
universe of 20 per cent, or a four-per cent sample. 

Only the first and last digits of the serial number were relied on to 
get 20-per cent and four-per cent samples. However, for the three-per 
cent samples used for tabulating 1941-44 annual employee data, and 
the one-per cent, 0.1-per cent and 0.02-per cent samples used currently, 
selection was on the basis of internal as well as external digits of the 
serial. The three-per cent sample was obtained by splitting the four-per 
cent sample into two segments of one per cent and three per cent. The 
one-per cent segment was composed of accounts with a “2” or “7” in 
the first place of the serial and 05, 20, 45, 70, or 95 as the last two digits 
of the serial. Since the eighth and ninth place of the account number 
for persons in the four-per cent sample contained 20 possible numbers, 
(namely, 00, 10... 90 and 05, 15... 95), selection of five of them 
provided a fourth of the four-per cent sample, or one per cent. The 
three-per cent sample, of course, wes the residual segment after the 
one-per cent was sorted out of the four-per cent sample. 

To obtain the 0.02-per cent sample, the first step was to select from 
the aforementioned five groups in the one-per cent sample, the group 
that contained the digits 05 in the last two places. One-fifth of the one- 
per cent sample, or a 0.2-per cent sample, was thus obtained. Selecting 





2 While the percentage sample size of all these samples is small, the samples are actually large in 
terms of absolute size. These large samples are justified by the multi-purposes they serve and the great 
variety of classifications and cross-classifications for which data are needed. For example, in order to 
estimate the number of old-age retirement claims in each field office, data are needed on the number of 
workers between the ages 60-64 who are insured under the program. While this number was 35,000 
in the one per cent sample as of January 1950, for the entire country, the average per field office was 
about 70 workers. As another example, data were needed for a cancer research project on the number of 
workers in the rubber industry in Ohio in 1949, by age and sex. The one-per cent sample contained a total 
of 900 workers in this study. Experience has shown that it is less expensive to maintain a sample on & 
mechanical basis so as to serve many purposes than to vary the size of sample to meet different re- 
quirements. However, the need for continued maintenance of the large sized samples is regularly re- 
examined and further research on the optimum sample sizes is continuing in the Bureau. 





468 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


from this segment only the accounts with the digit five in the seventh 
place of the account number (or second place in the serial), yielded one- 
tenth of the 0.2-per cent segment, or a sample of 0.02-per cent. This is 
a systematic sample, since it consists of every five-thousandth number 
in the account number population and is selected proportionately from 
each area. 

Recently, the 0.02-per cent sample was increased to a 0.1-per cent 
sample. This was accomplished by adding to the 0.02-per cent sample, 
which already contained the digits 2505 and 7505, all the additional 
numbers which had a “5” in the second place of the serial out of the 
foregoing one-per cent sample, namely, the digits 2520, 2545, 2570, 
2595, 7520, 7570, 7545, and 7595. 

The chart on page 469 shows the specific digits in the serial number 
which are used for selecting the various sized samples of employees 
and beneficiaries. 

It is, of course, not necessary to use precisely this same combination 
of numbers in the serial to obtain systematic samples of the specified 
sizes. However, in deciding on the specific digits, considerations of 
economy in sorting have dictated the choice. With the file already in 
numerical order, the first digit in the serial was used to obtain the sub- 
universe in order to avoid extra sorting. Furthermore, where sorting 
costs are equal, digits are selected so as to give as wide a dispersion as 
possible, in order to reduce the effects of serial correlation. 

The 20-per cent sample from which are tabulated most of the data 
on beneficiaries under the program consists of the accounts that fall 
in the sub-universe of 20-per cent. Thus, all accounts with a “2” or 
“7” in the first place of the serial on which either old-age or survivors 
benefits are paid are included in the sample.’ This sample may be 
described as a systematic sample of clusters in order by area and date 
of issuance of the account number. There are two other samples which 
are digitally selected in the same way, namely, the 20-per cent and one- 
per cent samples of persons who receive account numbers each quarter. 


METHODS OF ESTIMATION AND MEASURING ERROR 


Estimating totals —Both sampling and other types of variations need 
to be taken into account in preparing estimates from the sample 
tabulations. Furthermore, the methods of estimation differ, depending 
upon the uses of the data, costs and availability of control totals. 
Because the old-age and survivors insurance digital samples are of 





3 Plans have recently been made to reduce this sample to ten per cent by selecting the following 
digits in the serial: “2” and “7” in the first column and “0”, “2”, “5”, “7” and “9” in the last column, 





R 1953 


renth 
one- 
his ig 
mber 
from 


cent 
nple, 
ional 
f the 
570, 


nber 
yees 


ition 
ified 
is of 
ly in 
sub- 
‘ting 
n as 


data 

fall 
” or 
vors 
r be 
late 
hich 
one- 
ter. 


eed 
aple 
ling 
als. 
> of 


wing 
lumn, 


FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 


SPECIFIC SERIAL NUMBER DIGITS USED TO SELECT 
STATISTICAL SAMPLES OF EMPLOYEES AND 
BENEFICIARIES UNDER THE FEDERAL 
OLD AGE AND SURVIVORS PROGRAM 











Size of Sample 





20 1 0.1 0.02 
Universe— Per cent— Per cent— Per cent— Per cent— Per cent— 
Alldigitsinthe Digits2or7in DigitsOor5in Digits 20, 70, Digits 520, 570, Digit 505 in last 
serial first place last place 05, 45, or 95in 505,545, or595 three places 
last two places in last three 
places 














ownx oo. 
Otnw CHW 


onux es 














Fox nm BDexnws Goxunv 
Genw Benn tEeanew 








onxs 
ewan Ono 


oawx 














Fena eeqne8 


a stan 


uniform size throughout all classifications, methods of estimation 
from the sample data, for unrefined uses of the data (e.g. in determin- 
ing general relationships and magnitudes), are usually simple and 











470 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


straightforward, namely, by use of the reciprocal of the sampling ratio, 
Thus, the 20-per cent sample of newly issued accounts can be inflated 
to the universe total by a multiplier five. The one-per cent sample of 
workers with wage credits can be similarly inflated by adding two 
zeros to the sample figures (both of workers and wages). Likewise, the 
0.02-per cent and the 0.1-per cent sample data can be multiplied by 
5,000 and 1,000 respectively, to yield estimated totals. Since the old- 
age and survivors insurance samples are self-weighting, computation of 
derivative measures, such as averages, percentages, ratios, coefficients 
of variation and other statistical and analytical measures is made 
directly from the sample distributions. 

While inflation of the sample data for the many thousands of cells 
to obtain statistics on approximate magnitudes is accomplished by 
simply using the appropriate sampling ratio, as indicated above, devi- 
ations from this method are relied on when greater precision is needed 
in estimates for selected cells. One deviation from the use of the above 
sampling-ratio method occurs when an actual universe total is avail- 
able as a by-product of the accounting or claims control operations. 
In such instances, estimates are prepared not by the probability ratio 
but instead by use of ratio estimates. 

One important universe control figure, derived from accounting data 
is the total amount of taxable wages paid in each calendar year. For 
selected estimates, this control figure is divided by the figure on total 
taxable wages in the sample, and the resulting ratio is multiplied by 
the sample figures on wages. Thus, a ratio estimate is used in inflating 
sample data on wages of workers in selected classifications. It would, 
of course, be desirable to use this method for inflating all sample cells, 
but this would be too expensive to do for the thousands of cells tabu- 
lated. 

In dealing with the 20-per cent sample of beneficiaries and previously 
employed persons represented in benefit awards, where the number of 
cells for which data are published is smaller than in the case of the 
employee samples, universe totals for each type of benefit (six types, 
such as retired worker, aged wife, widow, etc.), obtained from tne 
claims control records, are used. Therefore, the 20-per cent sample 
data are inflated to a 100-per cent basis by means of several differeat 
universe control figures which provide ratios for the different type-of- 
benefit groups. 

In the case of the samples of employees, the estimates for selected 
cells are frequently adjusted for the exclusion from the sample tabula- 
tion of late-processed or delinquently-filed wage reports; in other 








% 1953 


atio. 
ated 
le of 
two 
, the 
1 by 
old- 
in of 
ents 
ade 


-ells 

by 
evi- 
ded 
ove 
rail- 
ons, 
atio 


ata 
For 
tal 


ing 
ld, 
lls, 
bu- 


sly 
of 
the 
es, 
me 
ple 
‘at 


ed 
la- 
er 





FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 471 


words, the reports which are not in file at the time of sample selection 
and tabulation. These adjustments are generally made after supple- 
mentary data are obtained later from special sample tabulations of 
information filed in late reports. In some instances, the estimates for 
selected classifications are adjusted on the basis of tabulations of de- 
linquently-filed or late-processed reports for previous years, by as- 
suming the same proportionate adjustment as for previous years 
applies to the current-year data. Thus, in making estimates from the 
social security samples of workers and beneficiaries most of the prob- 
lems of measuring the bias of late-response are manageable as a result 
of the tabulation and analysis of late-filed tax returns.‘ 

One additional factor for which correction is necessary in the 
estimates of employees is the issuance of multiple accounts. As previ- 
ously mentioned, the sample to provide data on employees is drawn 
from the universe of social security accounts. In estimating the number 
of workers it is not known precisely to what extent the sample of 
accounts with wage credits differs from one on individual workers, 
because of the existence of an unknown number of multiple accounts. 
Despite all efforts to avoid issuing more than one number to an indi- 
vidual, it is known that an undetermined number of persons, for one 
reason or another, have more than one number. That fact alone, of 
course, would be no cause for concern in using the worker data, were 
it not for the fact that some of these persons have wages credited to 
more than one account, and, therefore, they have a greater chance of 
being included in the old-age and survivors insurance samples. Infla- 
tion by use of the reciprocal of the sampling ratio causes some over- 
statement in the estimated number of workers. In addition, there is 
understatement in the person’s wage credits, because in some cases 
an individual’s wages are credited to two or more accounts which are 
not combined in the sample. Many of these multiple accounts are 
discovered as part of the regular accounting operations, thus forming a 
basis for measuring part of the bias [4] and partly adjusting the esti- 
mates. A special study is currently under way to develop more infor- 
mation about the bias in the estimates of workers due to the factor 
cf multiples. 

Measuring sampling variability From the foregoing, it is obvious 
that the OASI samples are not unrestricted random samples. They are 





4 There are also some relatively minor problems of non-response and incomplete response, generally 
representing less than one per cent of the employment and earnings totals. However, follow-up contacts 
are made with employers by the Bureau as part of the regular administrative operations and corrections 
are reflected automatically in the samples. Because the reports under the program are required by law, 
these problems are believed to be relatively insignificant. 





472 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER i953 


usually two-stage samples. The first stage is the systematic selection 
of a sub-universe consisting of clusters of 1,000 items each from the 
universe of social security accounts. As previously indicated, this sub- 
universe is stratified by area. The next stage is the selection of digital 
systematic sub-samples from the sub-universe. These sub-samples 
inherit the area stratification of the sub-universe. To the extent that 
such stratification operates to increase the variability within the 
samples it tends to reduce the sampling error associated with unre- 
stricted random samples, as demonstrated by Lillian Madow [5]. On 
the other hand, the selection of clusters of items for the sub-universe 
operates to increase the sampling error. It is not known definitely 
whether the sampling variance from this type of sample is greater or 
smaller than that of unrestricted random samples. [6]. Nevertheless, 
in the absence of data to measure sampling variance by more acceptable 
methods, use has been made of the variances which would result from 
unrestricted random samples as an approximation of sampling error 
in the data. To improve on this method plans are being made for meas- 
uring sampling variance by the use of several sub-samples of equal size 
to be taken from the one-per cent Continuous Work History Sample 
[7]. Results from this study however, are not expected to become 
available for about a year. 

Sampling variability has been studied in a qualitative way by com- 
paring data derived from different-sized samples. It is believed on the 
basis of the comparisons between the data in the various samples that 
sampling error in the OASI data is not far from that which would be 
expected in random samples. 

Detecting non-sampling errors —While studies to measure sampling 
variability in quantitative terms are primarily in the planning stages, 
studies of non-sampling error (such as coding, punching or tabulating | 
errors) have progressed further. There are several methods in use. 
From time to time comparable data become available from different 
samples, thus making possible comparison of estimates or distributions 
derived from these samples and the computation of measures of sig- 
nificance (such as chi-square tests). These tests of significance are 
made despite the fact that they are not strictly applicable to the 
OASI type of systematic samples, on the assumption that at a 99-per 
cent confidence level they would generally aid in detecting probable 
non-sampling errors. The following table illustrates this type of a com- 
parison of data derived from the one-per cent and one-tenth-per cent 
samples. 

It is apparent by inspection of the percentages that a very high 





FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 473 


correspondence exists between the age distribution of employees for 
the same year as tabulated from the two samples. Statistical tests of 
differences did not reject the hypothesis (at a 99-per cent confidence 
level) that they came from the same universe. 


MALE WORKERS BY AGE, 1950, ONE-TENTH- 
PER CENT AND ONE-PER CENT SAMPLES 








Number Percentage distribution 





Age 0.1l-per cent  1-per cent 0.1-percent 1-per cent 


sample sample sample sample 





Total 32,312 323 ,526 100.0 100.0 





Under 15 58 765 0. 
15-19 2,355 24 ,027 7. 
20-24 4,095 41,738 12. 
25-29 4,486 44,577 13 
30-34 4,138 41 ,030 12 
35-39 3,806 38 ,592 11. 
40-44 3,474 33,919 
45-49 2,786 28 ,056 
50-54 2,388 23 ,961 
55-59 1,924 19 ,345 
60-64 1,500 14,665 
65-69 813 7,988 
70-74 337 3,237 
75 and over 122 1,255 
Unreported 30 371 


POMNRORAMDDDONWND 
ROUMORNMONDORD 


a 
| CBX NRANWSO 





One other type of study to detect non-sampling errors is made in 
analyzing statistics on newly issued accounts to determine deviations 
from the prescribed procedures for issuing numbers. As indicated 
earlier, present procedures call for the issuance of 100 numbers contain- 
ing the digits two or seven in the first place of the serial out of every 
block of 500 numbers assigned to a field office, so as to reduce intra- 
class correlation within the clusters of 1,000 in the sub-universe of 20- 
per cent used for sub-sampling.[1]. Misapplication of this procedure is 
determined from a comparison of the total accounts issued in each 
State with the number of accounts containing the digit “2” or “7” in 
the first place of the serial. The expected size of sample is compared 
with the actual size and deviations which exceed either the maximum 
or minimum tolerance are investigated, and remedial action is taken 


[2]. 





474 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


ADVANTAGES OF THE OASI TYPE OF SAMPLE 


Simplicity—The main advantages of the type of digital sampling 
used are apparent from the foregoing presentation. Foremost is the 
fact that this type of sample is simple to select and understand. In 
view of the fact that the clerical staff which usually performs the 
machine operations of sample selection is not trained in sampling, a 
simple, straight-forward procedure of selection avoids complications 
and trouble spots. It is also a simple process to inflate the sample data 
to the universe because the sample is self-weighting. 

Precision and accuracy.—A second good feature of the sampling sys- 
tem is that it can take advantage of the method of issuing numbers to 
yield area and time stratification and, therefore, results in better pre- 
cision than if such stratifications were not attained. Furthermore, con- 
trol lists are available showing the account numbers which have been 
issued. By checking the sample account numbers against these lists, 
missing account numbers can be added and incorrect ones removed. 
This leads to greater accuracy because it eliminates mechanical or 
clerical errors both of omission and commission. 

Flexibility —The digital sample serves many purposes because of its 
flexibility. For example, this type of sample is most appropriate for 
tabulating data on the work history and wage patterns of contributors 
under the program—a type of data essential for a continuing evalua- 
tion of the operations of the program. In compiling such data, it is 
necessary to have a sample which can be easily maintained by adding 
to it each year a sample of the new workers and by identifying deceased 
and retired persons. Replenishment of the sample by new workers is 
automatically accomplished by including a sample of the persons who 
currently receive new account numbers having the predetermined 
sample digits. Thus, for example, a person who obtains a new number 
having as its serial 2505 or 7505 is automatically added to the 0.02- 
per cent sample. Furthermore, by maintaining a large general-purpose 
sample the digital system readily yields smaller samples, as the need 
arises for specific purposes, by mechanical sorting on selected digits 
in the account number. Finally, the system affords a simple method of 
enriching the informational items about a given person covered under 
the social security program by adding information easily derived from 
the records maintained by other programs using the same identifica- 
tion system. For example, because the Railroad Retirement Board 
also relies on the nine-digit account number for identifying workers 
covered under its insurance program, it is relatively simple to add data 
for identical workers from their digital sample into the OASI sample. 
This leads to more complete statistical series because it provides fuller 





FEDERAL OLD-AGE AND SURVIVORS INSURANCE RECORDS 475 


information for persons with wage credits under both systems. Simi- 
larly, the State unemployment insurance programs rely on the nine- 
digit social security number to identify workers, and it is relatively 
simple to coordinate the statistical samples under the two programs. 

Economy.—Economy is also a major advantage of the sampling 
system. This is especially true if sample selection and tabulation is 
integrated with appropriate accounting and administrative operations. 
Economy in the compilation of data is also achieved because of the 
aforementioned statistical controls and lists which forestall in the 
early stages unnecessary review of inconsistencies which would show 


up later. 
CONCLUSIONS 


The OASI system of sampling for statistical tabulations is based on 
an area stratified sub-universe of systematically selected clusters with 
systematic sub-sampling. Little research has been done on this type 
of sample design, and further study is necessary. The following are 
among the more important areas of research: 1) Estimating more pre- 
cisely than by the use of binomial tables the sampling error for the 
OASI type of sample. The procedure to be used [7] is expected to pro- 
duce more accurate estimates of sampling variation. 2) Reducing the 
size of the various samples so as to economize still further in the 
statistical program, without losing essential data and sources for special 
tabulations. 3) Measuring the bias that is introduced into the employee 
(but not beneficiary) data when a sample of accounts to which wage 
credits have been posted is used to determine the number and charac- 
teristics of individual workers. The presence of multiple account 
numbers creates this problem which has been discussed earlier. 


REFERENCES 


[1] OASI Sampling Methods”—Social Security Bulletin, June 1951. 

[2] Mandel, B. J., and Hearn, Saul D., “Sampling Variations in the Continuous 
Work History Data of Old-Age and Survivors Insurance” internal BOASI 
report, February 1948. 

[3] Perlman, J., and Mandel, B., “The Continuous Work History Under OASI,” 
by J. Perlman and B. Mandel, Social Security Bulletin, February 1944. 

[4] Cornfield, Jerome, “On Certain Biases in Samples of Human Populations,” 
Journal of the American Statistical Association (1942). 

[5] Madow, Lillian H., “Systematic Sampling and Its Relation to Other Sam- 
pling Designs,” Journal of American Statistical Association, (1946). 

[6] Madow, William C., “On the Theory of Systematic Sampling, II,”—The 
Annals of Mathematical Statistics, 20 (1949). 

[7] Quenouille, M. H., “Problems in Plane Sampling,” The Annals of Mathemati- 
cal Statistics, 20 (1949). 





STATISTICS IN CHEMICAL EXPERIMENTATION* 


C. DANIEL 
New York City 


I. NEED FOR A MANUAL OF STATISTICS FOR CHEMISTS 


IGHTY-SEVEN per cent of the 140 respondents to a recent question- 
E naire (sent by the School of Chemical and Metallurgical Engi- 
neering at Cornell, to its graduates) indicated that a course in applied 
statistics should be part of the regular training of engineers. All-day 
sessions of the American Chemical Society, and of the American In- 
stitute of Chemical Engineers, on statistical applications in their re- 
spective fields, are heavily attended. The chemical and engineering 
periodicals are publishing quite a number of papers showing examples 
of the uses of statistics. The bibliography of Hader and Youden on Ex- 
perimental Statistics [4], published in January, 1952, and containing 
some 150 references covering a three year period, can by now be con- 
siderably extended. At least twenty statistical texts published in the 
last few years clamor for the attention of the research worker in chem- 
istry. It may be concluded that chemists and engineers are becoming 
increasingly aware of the usefulness of statistical methods. 

Considering this situation, and in view of the competitive demands 
of other developing fields, it is natural to ask for a book that will sum- 
marize the statistical contribution in a short, easily understood, tech- 
nically correct, comprehensive, manual—to be well-printed, well-bound, 
and inexpensively marketed. It seems entirely safe to state that such 
a book will not appear. If it is short, it will not be technically correct; 
if it is comprehensive, it will not be easily understood; and if it is well- 
made, it will not be inexpensive. Put in another way, a manual for the 
use of workers who need a convenient source of ready reference will 
be properly used only by those with considerable fluency in the field. 
A manual can hardly be an introduction. A serious introduction, in 
this field, can not possibly be short, if it is to carry the reader through 
from a weaning away from his prejudices about the fortuitous and the 
random, all the way to an understanding of modern experimental de- 
sign. 

But there is without doubt a need, and a growing one, for a manual 
of statistics, one that can be used by chemists who have had some time 
to think about statistical notions; one which gives standard practices 





* A review article on W. L. Gore’s Statistical Methods for Chemical Experimentation, New York. 
Interscience Publishers, Inc., 1952, pp. vii, 210. $3.50 


476 





stion- 
Engi- 
plied 
l-day 
n In- 
ir re- 
ering 
nples 
n Ex- 
ining 
- con- 
n the 
hen- 
ming 


1ands 
sum- 
tech- 
yund, 
such 
rrect; 
well- 
yr the 
2 will 
field. 
nN, in 
‘ough 
d the 
il de- 


anual 
time 
tices 


r York, 


STATISTICS IN CHEMICAL EXPERIMENTATION 477 


and general rules, omitting derivations, but carefully including restric- 
tions and assumptions. Such a book may be called a cook-book but 
that need not be a term of opprobrium, provided we can find the sta- 
tistical counterpart of Escoffier. 

Similar books, applying statistical methods in other research fields 
(agriculture, textiles, medicine, educational psychology) have been 
found to be very useful. In two important respects, chemical research 
is very like the substantive fields just mentioned. In the first place, 
many factors are thought to influence the outcome under study and 
the researcher wants to know about all of them. Secondly, it is quite 
common to find that duplicate runs (not duplicate readings, but seri- 
ous attempts to repeat results under similar conditions) do not check 
very well. 

It was indeed to these two aspects of agricultural and genetic re- 
search that R. A. Fisher addressed himself in developing his theory of 
the design of experiments. The extensions and simplifications of Fish- 
er’s ideas by J. Neyman and E. 8. Pearson, make it entirely practicable 
to explain the major useful tools of modern statistics to chemists with 
little likelihood of misuse. Several important steps in this direction have 
already been taken. W. J. Youden [7] has written an introduction to 
statistics for chemists that is as beguiling as it is authoritative. K. A. 
Brownlee [2], and O. L. Davies [3], have carried us some distance fur- 
ther but there are by now major omissions in both books (e.g., operating 
characteristic curves, distribution-free statistics). 

With some experience of the interests of industrial chemists, and with 
some acquaintance with the current state of the art of statistics, it is 
not difficult to form a general picture of the contents of a useful man- 
ual. Such a manual would outline the meaning of statistical tests of 
significance, define and give operating characteristic curves for the 
usual tests, and clarify with examples drawn from chemistry, the sev- 
eral warnings that must always accompany the reporting of “statis- 
tically significant (or insignificant) differences” to non-statisticians. It 
would also give, in terms entirely intelligible to most chemists, the 
meaning and use of confidence intervals for a wide variety of cases 
that are immediately usable. 

Such a manual would then, presumably, proceed to the statistical 
part of planning experiments aimed at measuring the effects of several 
factors simultaneously. The algebra of the analysis-of-variance calcu- 
lations for quite general linear hypotheses is by no means too difficult 
for most research chemists and chemical engineers. By paying atten- 
tion to the assumptions (untested hypotheses) underlying each use 





478 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


of the analysis of variance, it would be quite practicable to give re- 
search workers considerable aid in the planning and analyzing of some 
multifactor experiments. Little space would need to be given to the 
theory of the fundamental distributions, even though the results of 
that theory are often necessary for judging the results of experiments, 

The chemist who is looking for a manual such as that described will 
be greatly disappointed in Gore’s book. It is the reviewer’s opinion, 
based on several careful readings, that none of the needs indicated 
above is satisfied by this work. The harm that such a poorly prepared 
(and poorly edited) book may do is not easily overestimated. Perhaps 
the major hurt will be to those few chemists who find themselves 
prejudiced against the general field by contact with the specific un- 
fortunate example. 


II. INTERSCIENCE MANUAL NO. 1 


This book is the first in a new series of manuals which will provide, 
according to the publishers, “a straightforward description of labora- 
tory procedures and methods for the evaluation and recording of ex- 
perimental results.” The manual contains seven chapters, two appen- 
dixes, a glossary, a bibliography, and a subject index. 

The Introduction, Chapter I, starts with a discussion of the scope 
of statistical methods. On page 1 the author writes, “Thus the appli- 


cation of probability theory to define the nature of variability has led 
to techniques, called ‘Statistical Methods,’ whose useful function is to 
measure the uncertainty in inductive reasoning based on experimental 
data. This measure of uncertainty is a probability based on only the 
data at hand.” Since the reader is presumably a chemist in a hurry to 
get on to the usable results, perhaps these rather opaque sentences 
should not be criticized too adversely. On the other hand, it does seem 
too bad to promise something at the very beginning which cannot be 
delivered, and which has in fact long since been dropped as an objec- 
tive by all statisticians. 

The second section of the first chapter summarizes “An Experiment 
in Variation” giving data from duplicate determinations of per cent 
moisture, by each of 6 analysts, on each of 5 samples from a nylon 
dryer, on each of two days. This experiment will be discussed later. 

Chapter II on Statistical Concepts starts with a section on Fre- 
quency Distributions. The first two sentences read, “A frequency dis- 
tribution is measurement data of more than one article, sample, time 
of measurement, or occurrence of similar classification. Frequency dis- 
tributions may be divided into two types—‘populations’ or ‘universes’ 





IR 1953 


ve Te- 
some 
0 the 
Its of 
1ents, 
d will 
inion, 
cated 
pared 
rhaps 
elves 
¢ un- 


STATISTICS IN CHEMICAL EXPERIMENTATION 479 


and samples taken from those ideally infinite universes.” The distinc- 
tion that is intended is surely important, but the chemist-reader who is 
going through his first book on statistics will hardly be helped by being 
told that a frequency distribution “is measurement data,” or that the 
second type of distribution is “samples taken from these ideally in- 
finite universes.” The contrast then made between population param- 
eters and sample statistics is important and is clearly drawn, but un- 
fortunately the distinction, and the relation, is not carried through in 
the remainder of the book. The same symbol, s, is used for the popula- 
tion standard deviation and for the sample standard deviation, even 
in the table of areas under the normal curve. The equation stated to 
be that of the normal distribution on page 16 would be more nearly 
intelligible if in place of f/N we had some symbol for the probability 
density, if in place of s we had o, and if in place of z (the deviation of a 
single value from the sample mean) we had X —u. 

Chapter III is on the Reliability of Estimates. Figure III-1, showing 
the distribution of single measurements, together with that of the av- 
erages of sets of four measurements, is misleading in that the area un- 
der the second curve is about one-tenth that under the former. The two 
areas should of course be the same. 

The definition of fiducial limits given on page 24 is: “The limits be- 
tween which one may have a given degree of confidence in what the 
true value (parameter) of a statistic will lie (sic) are called the fiducial 
limits.” There can hardly be many, chemists or others, who will find 
this language clear. 

The description of Student’s t-test is confused by the author’s pre- 
senting two formulas for estimating the variance of the estimated dif- 
ference between two means. The first adds the estimated variances of 
the two means, calculated separately; the second is the one given by 
Student and by Fisher. The author recommends the former, appar- 
ently on the grounds of ease of calculation. 

The example given on page 32, of a “Student’s t-test” is a t-test only 
by the accident of equal sample sizes. The same formulas, used with 
unequal sample sizes, would not give a t-test. The statement, “Some 
question exists as to whether a t-test is valid if the standard errors of 
the two means are considerably different” is erroneous. The relative 
magnitudes of the standard errors of the two means have nothing to do 
with the case. Very likely the difference referred to is that between the 
two population variances. But in case these differ widely there is no 
question about whether the ¢-test is valid or not. It is invalid. When 
the two sample sizes are equal it would seem more sensible, if widely 





480 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


differing population variances are expected, or even possible, to esti- 
mate the population variance of differences. This would give a t of 3.08 
with 4 degrees of freedom, rather than the 3.67 with 8 degrees of free- 
dom reported by the author. 

On the general question of unequal population variances it is not 
apparent that the author is following any particular published research, 
nor is there any reference to the literature of this problem. A. A. Aspin 
and B. L. Welch [1] recommend the same statistic that the present 
author does, but then they also supply a table of critical values of the 
statistic. H. Scheffé [6] shows how to calculate a quantity that has the 
t-distribution with the maximum possible number of degrees of free- 
dom, but it is by no means the statistic given by Gore. It may be 
doubted that the casual solution of the Fisher-Behrens problem of- 
fered in this book will satisfy many statisticians. It is to be feared that 
it will satisfy too many chemists, who, unaware of the confused state 
of the art, may assume that the equations given in the manual have 
the sanction of wide acceptance. 

A short section on the F-test for comparing two sample variances is 
followed by two pages on “propagation of error.” Gauss’ equations, 
which are approximate in general even when the population parameters 
are known, are intended, but sample statistics are used, even when 
only four degrees of freedom are available for each of two sample vari- 
ances, as in the numerical example presented. It would be more nearly 
correct to form five ratios at random from the two sets of five values 
given, to use these as five estimates of the population ratio, and then 
to use a t-value of 2.78 for four degrees of freedom, instead of the 2.0 
used in the text, which assumes an exact knowledge of the two vari- 
ances. This procedure would give a confidence interval of half-length 
0.018, roughly three times the width calculated by the author’s use of 
Gauss’ equations. The reader who wants to follow the calculation as 
printed may be slightly confused by two minor errors in arithmetic and 
two misplaced exponents in the formula. No indication is given of the 
approximations implied in the derivation of Gauss’ equations, nor of 
their use to derive confidence intervals for rational functions other than 
the sum and difference. 

Chapter IV is on the Analysis of Variance. The assumptions of sta- 
tistical independence and of a linear model are not stated. However the 
assumption of constant error variance is rightly emphasized. It is the 
author’s judgment that this latter assumption “appears to be satisfied 
within practical approximations by most experimental data sets.” He 
proposes and gives Bartlett’s chi-square test for doubtful cases. This 
practice has the disadvantage, pointed out by G. P. E. Box, that a test 





DR 1953 


) esti- 
f 3.08 
' free. 


iS not 
arch, 
Aspin 
esent 
of the 
is the 
free- 
ry be 
m of- 
| that 
State 
have 


ces is 
‘ions, 
eters 
when 
vari- 
early 
alues 
then 
e 2.0 
vari- 
ngth 
se of 
yn as 
: and 
f the 
or of 
than 


' sta- 
r the 
3 the 
sfied 
” He 
This 
test 


sTATISTICS IN CHEMICAL EXPERIMENTATION 481 


sensitive to non-normality is being used to reject some applications of 
the F-test, the latter being in this respect a robust test, not sensitive to 
non-normality. 

An “analysis of variance” of the 120 observed moisture percentages 
in the experiment summarized above is then presented. The model be- 
ing used is not given, the expected values of the mean squares are not 
shown, nor is there any indication of how the “duplicate” measure- 
ments were made. If, as appears plausible from the context, the five 
samples are assumed to be a random sample from the dryer, then the 
so-called Mixed Model, first described, so far as the reviewer is aware, 
by Mood [5], is appropriate. 

The writing down of the expected values of the mean squares would 
have shown, before the experiment was carried out, that duplicates, 
even if properly taken, are of little value, since they can only be used 
to test the analyst-by-sample-by-time interaction. Put in the terms of 
the chemist: If there may be wide variation in the per cent moisture in 
various parts of the dryer, then this variability should be sampled, and 
not the variability of the moisture-determination; i.e. more samples 
should be taken from the dryer. In terms of the analyst of variance: 
Half the degrees of freedom, and so half the measurements made in this 
experiment, were used in judging the significance of a three-factor in- 
teraction. The three two-factor interactions, and the variability be- 
tween different parts of the dryer, could all have been better deter- 
mined, in one sense twice as well, if ten dryer-samples had been taken 
and no duplicates run. 

The author fails to mention randomization in allocating treatments 
to experimental units. As a result, in five of the seven examples in which 
interactions are calculated (pages 58, 67, 78, 105, 107) the reported 
error mean squares are of an order of magnitude smaller than the inter- 
action mean squares. It seems likely then that these error mean squares 
were all calculated from “chemists’ duplicates,” that is, fr_m measure- 
ments made on parallel samples, which only in rare cases can be ex- 
pected to give an unbiased estimate of the effects of all the chance fac- 
tors operating in an experiment. On page 59 we read “This example 
demonstrates that a very fallacious estimate of the reliability is some- 
times given by considering only duplicate checks.” Unfortunately there 
isno mention of the method, which like so much else we owe to R. A. 
Fisher, of getting an unbiased estimate of error. 

The fifth chapter is on the Design of Experiments. The first section 
attacks the dogma of the controlled, one-factor-at-a-time experiment. 
Measurements of a yield, y (actually the volume of a fixed weight of 
gas), were made in quintuplicate at each of three pressures, 400, 600, 





482 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


and 800 mm., the temperature being 25 degrees Centigrade. A plot of 
y versus the reciprocal of the pressure, 1/P, gives a nice straight line, 
Then two (quintuplicate) runs were made at 400 mm. pressure, but at 
temperatures 100 and 200 C. Again a three-point plot versus 7’ gives 
a straight line. The author then combines the two equations for the 
two straight lines to get an equation of the form Y =a+b,7'+b2/P, and 
shows that this equation predicts the yield at a pressure of 800 mm, 
and a temperature of 200 C very poorly indeed. But surely it would be 
an unusual chemist who would combine the two linear equations in 
this way, for this would imply that he had forgotten Boyle’s and 
Charles’s laws. The force of the example, planned to show the weak- 
ness of the “classical” approach, is vitiated by the implausible equa- 
tion used. Using the form of equation which a chemist might well 
choose for the data at hand, viz. Y=k(7'+a)/P, and evaluating a from 
one graph to be 277 and k from the other plot to be 41.38 this chemist, 
innocent of least-square methods, would find that the yield at 800 mm. 
and 200 C can be quite closely predicted: 24.67 calculated, 25.14 ob- 
served. Such a chemist might well conclude that nothing has been 
shown to be wrong with conventionally controlled experiments. 

The discussion of Factorial Design opens with a droll definition of 
orthogonality: “Such a design is said to be completely orthogonal be- 
cause the design can be symbolized by squares or rectangles.” The idea 
of additivity of effects is not mentioned; the treatment of the comple- 
mentary concept, interaction, is somewhat distressing, two of the three 
“types” given being in error. 

“Interaction in experiments in chemistry” Gore writes, “usually can 
be classified into one of three types of mathematical functions or into 
a combination of these: 

(1) Hyperbolic (cross product) relationship between variables (e.g. 

Y=KPT or Y=KT/P). 
(2) Power function relationship (e.g., Y = K,P?+K.,T?+ - - - etc.) 
(3) Exponential function relationship (e.g., Y = K; log P+ K, log T 
+ ---etc.)” 

The accompanying discussion does not clarify or correct the impres- 
sion given by these examples. There is no interaction between P and T 
in (2) or (8). 

The reason given for expecting large interactions in chemical experi- 
ments is that “the mechanisms involved in chemical changes are fun- 
damentally complex.” This argument should apply with at least equal 
force to all the fields in which statistical methods have been used. By 
the same token, we should expect even greater interactions in the so- 
cial and biological sciences. The commonest reason for the existence of 





STATISTICS IN CHEMICAL EXPERIMENTATION 483 


interactions is, of course, ignorance. When the experimenter does not 
know the form of the functional relation between his “factors” and his 
outcomes, or when the range of variation of some of the factors, and 
of the resultant outcomes, is so great that the equations he has used 
before break down, then he finds interactions. It appears likely that 
the author’s belief in the frequent occurrence of interactions in chemi- 
eal experimentation is correlated with his failure to recommend ran- 
domization and with his consequent underestimation or error vari- 
ances. 

The next section is on the Estimation of Experimental Error. An ex- 
cellent example is provided by a set of data taken to evaluate a method 
of measuring the per cent weight-loss of vinyl polymers when heated 
at 260 C. Two samples, one stabilized and one not, were measured by 
two operators, after two times of heating, on two different days, in 
duplicate. Unfortunately (perhaps deliberately, but neither reasons nor 
consequences are given) the stabilized sample was examined on two 
days and the unstabilized sample on two other days. If the 32 meas- 
urements have been arranged in a fully balanced way, as a single replica- 
tion of a 4X2’, then some judgment could have been made on a matter 
that must only be assumed from the data given, namely that the differ- 
ence between samples was the same on different days. 

Noting that the two samples gave differing discrepancies between 
duplicates, the author reports two error mean squares, but only one 
sample X time interaction mean square (p. 71). He uses the larger error 
mean square to test the interaction and judges it significant. He con- 
cludes that “From these results it does not appear feasible to estimate 
a reliability for the analytic method which will be independent of the 
type of sample analysed.” 

Inspection of the data makes it quite clear that the range of dupli- 
cate pairs for the unstabilized sample is about ten times that for the 
stabilized one. The ratio of the actual percentages of weight-loss is also 
about 10:1. It seems then that the coefficient of variation can be as- 
sumed constant. An analysis of variance of the logarithms of the values 
given permits eleven hypotheses to be tested (four on main effects, five 
on two-factor interactions, and two on three-factor interactions). All 
the mean squares for interactions are less than the error mean square, 
as are those for analysts and for days. Thus a set of simple conclusions, 
opposed to those drawn by the author, can be given. The precision of 
the method used, expressed as a coefficient of variation is 5%; this 
value holds for samples containing from 2 to 24 per cent volatile ma- 
terial. Secondly, the effect of increasing the time of heating from 30 
to 40 minutes is to increase the per cent weight-loss by a factor of about 





484 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10933 


1.22. (The true value of this factor lies with 95% certainty in the 
range 1.17 to 1.27.) This increase holds for both sample types, for 
either operator, and for all days. Finally, the ratio of results by the 
two operators does not differ with statistical significance from 1.00, ly- 
ing with 95% confidence in the range 0.96 to 1.04. The latter con- 
clusion holds for both types of sample, for either time of heating, and 
for all days. 

The failure of the text to give a clear discussion of error and of ran- 
domization as a necessary part of the statistical design of an experi- 
ment continues to plague later sections. The discussion of interaction 
and error, of replication, and of confounding are greatly weakened by 
this obscurity. 

In the opinion of this reviewer, the lengthy discussion of the use of 
2X2 Latin Squares is not of great value. In the only example given, in 
which a half-replicate of a 2° was run (three factors, each at two levels), 
it was decided to carry through the other half-replicate anyway. Most 
statisticians would have proposed the full 2? in the first place, especially 
in view of the author’s insistence on the wide prevalence of interactions 
in chemical experimentation. 

A fairly general treatment of linear hypotheses would have greatly 
simplified the presentation of the Analysis of Variance and of the De- 
sign of Experiments. The distinctions between the three broad classes 
of linear models in the analysis of variance (I, II, and Mixed) are not 
made, nor is a numerical example given of a components-of-variance 
problem. 

The chapter on correlation and regression follows the usual pattern, 
dealing first with linear regression on one sure variable, then with cur- 
vilinear regression, and finally with the multiple linear case. The for- 
mula given for the estimated standard error of the intercept of a 
straight line (8, page 131) is in error, being in fact the equation for the 
standard error of the mean value of y. The y-variate is wrongly referred 
to as the dependent variable, and the z-variate as the causal variable. 
The assumptions underlying the derivations of the equations given are 
not indicated, nor is there any advice on how to handle linear regres- 
sion when one or more of these assumptions are not satisfied. The stand- 
ard references on these problems are not given. The improvement in 
fit due to shifting from measurements to their logarithms (p. 136) is 
mistakenly judged by using Fisher’s z-transformation for comparing 
two independent estimates of the same population correlation coeffi- 
cient. Finally, the “95% reliability limits for estimating Yo” (the value 
predicted from a multiple linear regression equation) is erroneously 





R 1953 


1 the 
3, for 
y the 
0, ly- 

con- 
, and 


‘ran- 
peri- 
ction 


d by 


se of 
n, in 
els), 
Most 
ially 
Lions 


satly 
De- 
ASSES 
> not 
ance 


tern, 
cur- 
for- 
of a 
> the 
rred 
ible. 
| are 
rT eS- 
and- 
it in 
3) is 
ring 
ye ffi- 
alue 
usly 


STATISTICS IN CHEMICAL EXPERIMENTATION 485 


given as a constant, not depending on the x-coordinate of the point 
(p.156). 

The last chapter is on Attribute Statistics. 

An Appendix includes five tables—the “normal distribution area,” 
some percentage points of the cumulative distributions of t, chi-square, 
F, and of the sample correlation coefficient. No acknowledgment is 
made or source indicated. 

An annotated bibliography of 14 works is given but it will be clear 
from the comments above that this reviewer does not attach great 
weight to the opinions offered. 


III. SUMMARY AND CONCLUSION 


There is a real need for a manual to aid chemists in applying statisti- 
cal methods to experimentation. The book under review fails to satisfy 
any part of that need. Some of its omissions are major ones; for ex- 
ample, operating characteristic curves, randomization, and several of 
the basic assumptions underlying analysis of variance and linear re- 
gression are not mentioned. Its errors of presentation are equally seri- 
ous. For example, the notions of orthogonality and of interaction are 
misdefined and misused; the distinction between parameters and sta- 
tistics is repeatedly confused. 

The editors and publishers must share this adverse criticism with the 


author, since in many places the obscurity of the language used could 
easily have been rectified. 


REFERENCES 


[1] Aspin, A. A., “Tables for use in comparisons whose accuracy involves two 
variances, separately estimated,” With an Appendix by B. L. Welch, Bio- 
metrika, 36 (1949), 290-96. 

[2] Brownlee, K. A., Industrial Experimentation, Third American Edition. 
Brooklyn: Chemical Publishing Company, 1949. 

[3] Davies, O. L., editor, Statistical Methods in Research and Production, Second 
Edition Revised. London: Published for Imperial Chemicals Industries Lim- 
ited, by Oliver and Boyd, 1949. 

[4] Hader, R. J., and Youden, W. J., “Experimental statistics,” Analytical Chem- 
istry, 24 (1952), 120-24. 

[5] Mood, A., Introduction to the Theory of Statistics. New York: McGraw-Hill 
Book Company, 1950. 

[6] Scheffé, H., “On Solutions of the Behrens-Fisher problem, based on the t-Dis- 
tribution.” Annals of Mathematical Statistics, 14 (1943), 35-44. 

[7] Youden, W. J., Statistical Methods for Chemists. New York: John Wiley and 
Sons, Inc., 1951. 





LIFE TESTING* 


BENJAMIN EPSTEIN AND MILTON SosBeE.t 
Wayne University 


I, INTRODUCTION 


Ix THIS paper we discuss statistical problems which arise when the 
observations become available in an ordered manner. Usually ob- 
servations made on a random variable do not become available in this 
way. If n items are taken from a machine and measured for some char- 
acteristic such as diameter, it would be quite an anomaly and indeed a 
cause for concern if the first item taken from the machine had the small- 
est diameter; the second item, the second smallest diameter, etc. How- 
ever, there do exist numerous practical situations, for example, life 
testing, fatigue testing, and other kinds of destructive test situations, 
where the data do become available in this way. If n radio tubes are 
put through a life test, for example, then the weakest one fails first in 
time, the second weakest one fails next, etc. Indeed it seems fairly clear 
that observations will naturally occur in an ordered manner in life test 
situations whether we talk about the life of electric bulbs, life of radio 
tubes, life of ball bearings, life of various kinds of physical equipment, 
or length of life after some treatment performed on animals or human 
beings. There are still other situations, for example, testing the current 
needed to blow out a fuse, the voltage needed to break down a con- 
denser, the force needed to rupture some physical material, etc.,where 
observations become available in order if one arranges the test in such 
a way that every item in the sample is subjected to the same stimulus 
(current, voltage, stress, dosage, etc.) so that the first weakest item 
fails, then the second weakest item fails, etc. 

Put in general terms, we test n items drawn at random from some 
population and the data become available in such a way that the small- 
est observation comes first, the second smallest second, . . . , and finally 
the largest observation last. Clearly we can, if we choose, discontinue 
experimentation after we have observed the first r failures in a life test. 
What are the advantages associated with the possibility of stopping 
before all n observations are made? It seems that two principal ad- 
vantages stem from the fact that the observations occur in an ordered 





* The work described here has been carried out under an Office of Naval Research Contract. Some 
of the results were obtained at Stanford University in the summer of 1951. This paper is essentially 
lecture given at the Stanford Inspection Conference, August 20-22, 1951. 

t Now at Cornell University. 


486 





n the 
y ob- 
n this 
char- 
leed a 
small- 
How- 
2, life 
tions, 
oS are 
rst in 
clear 
e test 
radio 
nent, 
man 
rrent 
con- 
vhere 
such 
1ulus 
item 


some 
mall- 
nally 
tinue 
test. 
ping 
| ad- 
lered 


. Some 
ially a 


LIFE TESTING 487 


manner. These are that we may be able to reach a decision in a shorter 
time or with fewer observations than if we were to utilize a procedure 
which involves observing what happens to all the items under test (and 
thus in effect disregards the basic fact that information is being fed to 
us in an ordered manner). 


II. PREVIOUS LITERATURE 


The published literature dealing with the possibility of making use 
of order to reduce the time of experimentation or the number of obser- 
vations or both is, as far as we know, limited to three papers. In the 
first two of these papers the underlying distribution is assumed to be 
normal. The first paper was by Jacobson [7], who compares the operat- 
ing characteristic curves (for testing the mean of a normal distribution) 
of a test procedure based on the lowest 3 out of 5 observations with that 
based on the average of 5 out of 5, and 4 out of 4. He shows that the 
operating characteristic curve based on the average of the lowest 3 out 
of 5 observations is almost the same as one based on the average of 4 
out of 4 observations. He points out that in many cases the 2 out of 5 
items which have not been tested (since one stops the test after having 
observed the three smallest values) are for all practical purposes as 
good as new and consequently one has gained as much from using up 3 
items as one could have by using 4. In the case of certain electrical 
tests, ordered observations such as Jacobson considers can be obtained 
by placing the items tested in a test panel so that they are all subjected 
to the same current or voltage, but are not destroyed simultaneously. 
Hence one can get additional information by simply placing new fuses 
or tubes in the panel as the old ones fail. Jacobson considers only test 
panels of five sockets. The need for generalization to n sockets is clear. 

The second published paper is by Walsh [9]. The underlying assump- 
tion is in the main still the one involving normality and the values of 
rand n considered are very large (asymptotic theory). While Jacob- 
son’s main emphasis was on using order to cut down on the number of 
observations (presumably because each item destroyed is expensive), 
Walsh’s emphasis is both on the time-saving and observation-saving 
possibilities associated with using order. 

The third paper is by Halperin [6], who assumes only that the under- 
lying probability density function, f(x; @), is subject to certain mild reg- 
ularity conditions. Again one is dealing with the asymptotic situation 
where r and n are large. The principal result is that 6, the maximum 
likelihood estimate of 6, is consistent, asymptotically normally dis- 
tributed, and of minimum variance for large samples. A general ex- 





488 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


pression is given for the variance of the asymptotic distribution of 9. 
Small sample estimation of @ is considered for two special cases, one of 
these being the exponential density. As we shall see, this bears a close 
relationship to the present paper. 


III. SOME RESULTS IN THE EXPONENTIAL CASE 


When we started our work on this general problem one possible first 
problem was to generalize Jacobson’s results in the normal case for 
various combinations of r and n. After some consideration and after 
discussion with electronics experts, however, we decided to turn our 
initial efforts to ordered observations drawn from non-normal distri- 
butions. Specifically, we decided to study the case where the character- 
istic X being investigated has an exponential distribution with a den- 
sity f(x; 6) of the form 


1 
(1) f(x; 6) = re 6>0,2>0. 


If x is considered as life in hours, it appears that by choosing @ suitably, 
one can fit reasonably well the distribution of life for many types of 
electronic tubes. While this assumption will ultimately need investiga- 
tion (since perhaps the distributions are more nearly type III or some 
other skewed form) let us see what can be said if the density f(z; @) is 
really of this exponential form. Speaking in physical terms, we may say 
that 6 is just the average life in hours since 


= 3 
(2) E(X) -f x . edz = 0. 
0 


The first questions asked were, suppose n items are drawn from a dis- 
tribution with a density of the form f(z; 6) =(1/0)e-*/*. Suppose the 
observations! become available in order, i.e., %1.n.3%2n5 °° * S2rn 

- S2,,n. Suppose experimentation is discontinued after the first r 
observations are made. Then 

(a) What is 6,,,, the maximum likelihood estimate of 6? 

(b) What is the distribution of 6,.n? 

Without going into detail at this point, we assert that 


(3) 3 =. Tin + T2,n + Pian + Trin + (n heii See ; 





r 





1 By 2i,n we mean the ith smallest observation in a sample of n ordered observations. 





LIFE TESTING 489 


Further, 6,.n is unbiased and has a chi-square distribution? which de- 
pends only on r and not on 7 and is, in fact, identical with the distribu- 
tion of 6,,. From the point of view of estimation, this means that the 
estimate (3) has exactly the same precision as does 


(4) 6 _ Lior + L2,r + iis! + Lr or 





i 


the average of r out of r observations. From the point of view of accept- 
ance testing, the operating characteristic curve based on the lowest r 
out of n ordered observations (acceptance region of the form 6,,,.>C 
or 6r.n<C)* is identical with that based on all r out of r observations. 
Detailed proofs of the statements just made are given in Section 1 of 
the Appendix. 


IV. SOME REMARKS ON THE TIME SAVING FEATURE OF THE TEST 


If we assume, as we may in some cases, that the (n—r) untested items 
are essentially as good as new, then we are clearly in a situation where 
taking the lowest r out of n observations uses up the same number of 
items as taking all r out of r. What, then, is the justification for using 
the first procedure as against the second procedure? The answer is that 
the only justification is to save time. For instance, a test procedure 
which involves taking the smaller of two random observations will lead 
to a test whose operating characteristic curve is the same as that found 
in observing 1 out of 1. However, the expected length of time for the 
first procedure is only one-half that for the second procedure. Conse- 
quently, the procedure involving the use of the smaller of two observa- 
tions and stopping there is to be preferred to the one involving taking 
just one observation at random, if the saving in time outweighs the 
loss due to testing two items rather than one. Even if the (n—r) items 
are not as good as new, it might be of critical importance to be able to 
come to a decision quickly. The decision might involve the disposition 
of thousands of items and the possibility of coming to a decision quickly 
without increasing the risks of making a wrong decision might well be 
worth the cost of (n—7r) additional items. 

Let E(X,,,) be the expected length of time needed to observe the first 
r out of n ordered observations, and let E(X,,-) be the expected length 





2 More precisely 2r6;,n/6 is distributed as chi square with 2r degrees of freedom. 

3 Clearly a region of acceptance such as 6r,n >C is reasonable. If this average is bigger than a certain 
quantity which can be computed in advance, it seems intuitively reasonable and is indeed theoretically 
sound, to accept the hypothesis that the true mean life is some desired high value 4:. If, on the other 
hand, 67,.n<C, we accept the alternative hypothesis that the true mean life is some low (undesired) 
value 6s. This question is discussed in some detail in Section 3 of the Appendix. 





490 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


of time needed to observe all r out of r. Then the ratio E(X,,»)/E(X,.,) 
is a measure of the expected saving in time due to using the first pro- 
cedure as compared with the second procedure.‘ In Table 1 we give the 
values of this ratio for selected small values of rand n. This table shows 
that if “time is money,” procedures which use ordered observations may 
be very advantageous. 


TABLE 1 


RATIO OF THE EXPECTED WAITING TIME TO OBSERVE 
THE r’TH FAILURE IN SAMPLES OF SIZE 
n AND r RESPECTIVELY 
E(X¢,n)/E(Xz,) =Arn 

















Vv. A TEST PROCEDURE 


The next question we ask is how to find a test procedure which 
will approximate a prescribed operating characteristic curve. Put in 
statistical terms, we want to test the hypothesis Ho:6=6, against the 
alternative H,:0=6, <6). It turns out that our rule of action should be: 
accept Ho if 6,.,.>C and reject Ho if 6,..<C. In particular, how do we 
find r and C if we require that the operating characteristic curve shall 
be such that for 6=6,, L(@:)=Pr (accept 6=6; given that 4, is true) 
=1—a and for 6=62, L(@2) =Pr (accept @=6; given that @ is true) $8? 
a and B may be thought of as errors of the first and second kind or as 
producer’s and consumer’s risks, respectively. It turns out to be pos- 





4 For example, suppose that it takes on the average 7; hours until the testing of all r out of r items 
is completed. Suppose that E(X;,n)/E(Xr,r) =ar.n. Then the expected length of time required for the 
failure of the first r out of n is given by ay,n7. It can be shown that 


1 
E(Krn) = 02 1/(n—j+1) for fei) = <8, 2>0. 


The formula for E(X;,n) and other pertinent results are derived in Section 2 of the Appendix. 





LIFE TESTING 491 


sible to find r and C, given 6:, 62 (really the ratio @:/6; is all that is 
needed) and a, 6. The computation can be greatly simplified for se- 
lected values of a and 8, by using the results and tables contained in a 
paper by Eisenhart [3]. Also of use in this connection are the results 
and operating characteristic curves given in a paper by Ferris, Grubbs, 
and Weaver [4]. A detailed treatment of how to find a “best” test and 
compute its operating characteristic curve is given in section 3 of the 
Appendix. 
VI. AN EXAMPLE 


For example if 6,/6.=3 and a=8=.05 it is easy to show that a suita- 
ble r to use is 10. If 6;/@.=3 and a=.10 and B=.05, then the proper r 
is 8. Similarly if 6,/@.=3 and a=.05 and B=.10, then the proper r is 
8. This means, for instance, that if we want the test procedure to ac- 
cept a lot whose average life is 6, = 1500 hours 95 per cent of the time, 
and to accept a lot whose average life is 6.=500 hours only 5 per cent 
of the time, then a possible procedure is to observe 21,n, Z2,n) ° * * » L10,n; 
the first 10 among n items (n> 10), and if 610,, >814 hours accept 0=4, 
if 610,.n<814 hours accept @=62. Such a procedure will have an operat- 
ing characteristic curve for which L(6:) =.95 and L(62) $.05.5 It should 
be noted that n, the number of items tested, is left arbitrary. If one’s 
object is to reduce testing time, then it is clearly advisable from Table 1 
to make n more than 10. 


VII. A TEST PROCEDURE BASED ONLY ON , 


We should like to raise the possibility of another kind of decision 
tule which is simpler to state and to apply, and which has a power curve 
which coincides for all practical purposes with one based on 6,,n. To see 
the motivation behind this procedure, we examine equation (3). It will 
be noted that in (3) z,,, is weighted more heavily (if r<m) than are the 
earlier observations 21,n, Z2,n,)°**, Zr-1,n. This naturally raises the 
question of how much one loses (for example, choosing between @; and 
92 (62<9,)) if only 2,,, is used in making a decision. We know that for 
given 6;/62, a, 8 one can find an acceptance region for @; of the form: 
Accept 6; if 6;,,>C1 (reject otherwise). The question is whether for the 
same r one can choose n sufficiently large and a suitable C, such that 
the rule: Accept 4; if z,,,>C2, reject otherwise, has an operating char- 
acteristic curve which is for all practical purposes coincident with the 
one based on the rule 6,,,>C;? This is possible and n need not be much 
larger than r. For example, we saw that if 6;/0.=3 and a=8=.05, we 





5 By actual computation L(6s) is in this case equal to .048. 





492 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


needed to take the first 10 out of n>10 observations if we were to base 
our decision on the value of 610,, and get the required operating char. 
acteristic curve. It turns out that if n2 14, just as good a decision rule 
(in the sense of giving the same operating characteristic curve as one 
based on 610,n) is to use just 210,, the tenth value in a sample of size n, 
If n=14 and we are committed to taking the first 10 observations and 
then stopping and making a decision, it appears that little information 
is lost by forgetting about the nine earlier observations 2 ,n, 2,n, °° 
T9 ,n- 

As a specific example, we go back to testing @,;= 1500 against 6. =500 
with L(@;)=1—a=.95 and L(@.) =8=.05. Let our test now be based 
just on 20,20 the 10th smallest observation in a sample of size 20. Then 
it can be shown that the decision rule: Accept 4 =4, if 210,20 > 540 hours, 
reject otherwise (i.e., accept 62) yields an operating characteristic curve 
with the prescribed a, 8. This can be put another way: If the 10th ob- 
servation among 20 ordered observations occurs before 540 hours, ac- 
cept 62=500 hours (i.e., reject 6: = 1500 hours) ; if the 10th observation 
appears after 540 hours accept @:=1500 hours. Clearly, in the latter 
event, we would stop at 540 hours and not go on since it is a fortiori 
true that 210,20>540 if the 10th observation has not occurred by 540 
hours. Thus, the idea of using just the tenth observation leads in a most 
natural way to a consideration of truncated procedures. 

The possibility of truncation arises from the fact that information 
becomes available in an ordered manner. The possibility of truncating 
even before reaching the rth observation out of n (for example, the pos- 
sibility of developing sequential procedures in problems where the data 
arise in an ordered manner) now faces us squarely. It seems fairly evi- 
dent that we need to examine the gains (either in average time neces- 
sary to make a decision or in the average number of items destroyed) 
attainable by truncated and sequential procedures. It also seems clear 
that as the theory develops, the whole problem should be considered in 
the light of decision theory. 

It appears on the surface that our results are rather specialized since 
they were obtained for densities of the form f(x; 0) =1/6 e-*/*, n>0, 
6>0. However, similar results are valid in a wider class of situations.’ 


’ 





® Most of the results achieved for the density function (1/@)e~=/® also hold for the cumulative dis- 
tribution 


F(z) =1 —¢9@)/9, = 2>0 (@>0) 
= 0, otherwise, 


where g(z) is a strictly increasing function of z for r2O with g (0) =0 and g (#) =. For example the 
maximum likelihood estimate of @ is then given by 


s g(a1) +°** +29(er) + (n — r)g(zr) ; 


nn 





r 
As a case in point, we may mention the Weibull distribution where g(z) =z? (b a known constant). 





LIFE TESTING 493 


It is known, for example, that in life and fatigue testing, skewed distri- 
butions such as the type III, logarithmico-normal, and the Weibull dis- 
tribution (named after the Swedish physicist W. Weibull) are useful in 
fitting data. While we do not, at this stage of our research, have proofs, 
we think that weare justified in stating that for the skewed distributions 
enumerated above, the utilization of the first r out of n ordered ob- 
servations and in fact just the rth out of n ordered observations will 
give decision rules which will have at least as good power as those in 
current use and will save either time or items destroyed or both. 

In this portion of the paper we have tried to stress some of the under- 
lying ideas and the potential value of the proposed methods without 
giving any mathematical details. Some details are given in the Ap- 
pendix. 


APPENDIX 


1. Derivation of a “Best” Estimate Based on the First r out of n Ordered 
Observations Drawn from an Exponential Distribution. 


Let the following assumptions be made: 

(i) n items are drawn at random from a density of the form f(z; @) 
=1/0e-7/", x >0, 0>0; 

(ii) the observations become avaiiable in order so that 21,.S22,n 
S-++ Sa,nS +++ S2nn, Where by 2:,n(1SiSn) is meant the ith 
smallest observation in a sample of n ordered observations; 

(iii) the experiment is discontinued as soon as 2,,, has become availa- 
ble (i.e., after the first r observations are made). 

We wish under (i), (ii), and (iii) to find a “good” estimate of @ and 
to give the distribution of this estimate. This objective is attained in 
the following theorem. 

Theorem 1: Under (i), (ii), and (iii) an estimate based on the first r 
out of nm ordered observations which is “best” in the sense that it is 
maximum likelihood, unbiased, minimum variance, efficient, and suf- 
ficient is given by 


(1) r - Tin + T2,n +--+ Trin + (n ed r)Xr.n ; 





r 


The probability density function of 6,,, is given by 


1 
(2) fly) = e- hi (r/6)"y"—'e79"*, 


= 0, elsewhere. 





494 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


In order to show that Or. as given in (1) is the maximum likelihood 
estimate we write down the joint probability density function of the 
first r out of n ordered observations X1,n, X2,n) °° * » Xr,n- This is given 
by 
n! 1 - > tint (n—r)trn|/s 


ny m*°* 23392 — tal 
(3) Frey " ) (n—r)! 6 


OS Mn Stn SB-°° S 2.2 < oe, 
It can be shown in the usual way that 6,,, as given in (1) maximizes f 
and is thus the maximum likelihood estimate. 
The sufficiency of the estimate can be verified at once by using a re- 


sult in Cramér [1, p. 488], since the density (3) can be written in the 
form 


S (21,2; Z2,n) aetaice Lrin3 6) _ 9 (Orns O)h(x1,n, Mim ° **s Sen) 


where 


1 if OSM nS*' Sin < © 


as 0 otherwise. 


h(21,n, Tan °° 9 Sea) 


We defer the proof that 9,,, is efficient, unbiased, and minimum vari- 
ance until we show that the probability density function is given by 
(2). This we now do. 

Instead of the random variables X;,n, let us introduce r new random 
variables Y;,,, 1S7iSr, where 


(4) Yivn = Xin and Yinn = Xin = Bixtec 2 s 1 = #, 


We shall now prove Lemma 1. 

Lemma 1: The random variables Y;,, defined by (4) are mutually in- 
dependent. Further, for each 7, (n—i+1) Y;,, is distributed with com- 
mon density 1/6 e-7/*, z>0, and each Y;,, can be considered as a ran- 
dom variable which is the smallest value in a random sample of size 
(n—i+1) drawn from the parent density function. 

Proof: The joint probability density function of the Y;,, [obtained 
from (3)] is 


n! 1 u ’ 
(5) glysiny Yany °° * y Yen) = ———-. — ee & (R-E tN yn/0 


(n—r)!o ‘ 


where OS Yi,n<0,7=1,2---,7. 
Rewriting (5) as 





LIFE TESTING 
r(n—-tt+i1 
(6) g(Yi,n, Bains ° * * g Yr.n) a 11(“——) e7 (2—T+1) vi,n/8 
i=1 
clearly establishes Lemma 1. 
Rewriting 6,,n in terms of the yi,n, (1) becomes 


dl 


(7) 6.0 = Zz. (n — t+ 1)yi,n/r. 


t=] 


Since the characteristic function of the density (1/0)e~*’* is given by 
$,(t) =(1—7t0)~, it follows at once from Lemma 1 that 


r t0\-7 
(8) ¢i(t) = [I dta-innrin(t) = (1 = “) ; 


i=1 r 


From the uniqueness theorem for characteristic functions, it follows on 
inversion that the probability density function of 6,,, is given by 


1 
— r,t—lp—ry/@ 
«9 f-(y) G- DI (r/0)*y"*e "9", y>0 


= 0, elsewhere 


This establishes (2) of Theorem 1. \ 

To complete the proof of Theorem 1, we show that 6,,, is unbiased, 
efficient,’ and has minimum variance. The unbiasedness of 6-.n is @ con- 
sequence of the fact that E(6,..)=Je y f-(y)dy =9. 

For efficiency and minimum variance let us compute the Cramér-Rao 
lower bound 1/E(d log f/d0)?, where f is given by (3). But (3) can be 
rewritten as 


— 
(10) fain, Tan) °°* y Crnj3 0) = P e~r.n/@ ~=where C = n!/(n — 1)! 


Thus 


Piz 
(11) log f = log C — r log 0 — > 


and 


0 log f 


(12) 
a0 


a 5 
6 ee. 





7 For the definition of efficiency see [1], p. 481. 





496 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


0 lo 2 
eC)» 
00 


Thus 


Hence the Cramér-Rao lower bound is @/r. . 

But Var (6-.n) =o” y? f-(y)dy —0?=6?/r and since the assumptions 
needed for the derivation of the Cramér-Rao lower bound are clearly 
met in our problem, 6,,, is minimum variance and efficient since any 
other estimate must have variance at least equal to 6?/r. Thus Theorem 
1 is completely established. 


2. Distribution, Expectation and Variance of X;.n 


The random variable X,,, can be interpreted as the waiting time to 
get the rth failure in a sample of size n. The probability density func- 
tion of X,,, is given by 


n! en (n—rt+ 1)z/6 


(14) grn(%) = (r —1)'(n — 71)! ) 





[1 —e/*}-1, x > 0. 


One can easily find E(X,,,), the expected waiting time, directly from 
So” xgr,n(x)dzx. This gives the result 


n—1 .. r—1 
(15) B(Xrn) =n(" _ 9 = (-1)4( k Yn —r+k+ ». 


A much simpler formula for E(X,,,) will now be established. To do so 
we write X,,, as 


(16) , = Bsa + (Xen _ X1,n) + ai + (Rea = Bonta)- 
Hence from (16) and (4) it follows that 


(17) Xvn - > Yin. 


i=] 


Thus by Lemma 1 





LIFE TESTING 


18) E(Xen) = DC E(Yin) = ODL Wn -5 + 2). 


t=1 j=l 


Also from Lemma 1 it follows that 


(19) Var (X;.n) = 6?>, 1/(n —j + 1)? 
j=l 
It is left to the reader to verify the identities obtained by equating 
the right hand sides of (15) and (18). It should also be pointed out that 
in [5], p. 324, Gumbel found formulas (18) and (19) by using the mo- 
ment generating function of X,,n. 


3. Derivation of a “Best” Test Based on the First r out of n ordered 
Observations Drawn from an Exponential Distribution 


In this section we study the question of how “best” to use the first 
rordered observations (from a sample of size n) so as to decide between 
two values of 0, 0, and 6, (where @:>62). By “best” we mean according 
to the usual Neyman-Pearson terminology a test which has the prop- 
erty that among all tests having a fixed probability a (size) of reject- 
ing 8=6, when true, the test in question will have the largest possible 
chance of rejecting 6=6, when the alternative 6 =@, is true. 

To derive the “best” test we use the Neyman-Pearson lemma (see, 
eg., [1], pp. 529-531). According to this lemma a “best” test must be 
one for which the region of rejection can be found from the inequality 


(20) S (Zin, Teny °° * » Urns 62)/f (Xin; Tayny ** * y Lryny 61) > K. 
From (1) and (3) this becomes 


62 


Since 6; and 6, are preassigned constants such that (1/62) —(1/@:) >0, 
it follows at once that the region of rejection for 6=6, has the form 


(22) as <¢¢ 


To meet the condition that the probability of rejecting @=6, when true 
equals a, we need to choose C so that 


(23) Pr (6.1 <C| 0 = 6) =a. 


To find C explicitly we use Theorem 1, which states that 6r,n has 
(2) as its probability density function. From this it is very easy to 











498 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


verify that W=2ré,,,/0 is a random variable which is distributed as 
chi-square with 2r degrees of freedom. Thus (23) can be rewritten as 


2r 
(24) Pr (w < =) = a or equivalently as 
1 


2r 
(25) Pr(w >=) =1-< 
1 


Let us denote a chi-square variable with n degrees of freedom as x?(n) 
and let us define the constant x,?(n) by the equality 


Pr (x?(n) > x1?(n)) = 7. 
Thus (25) can be written as 
(26) C > 61:X1—a2(2r) /2r. 


Hence (23) will be satisfied if the region of rejection for @=9@;, is given 
by 


(22) 6r.n < Ax1-a2(2r)/2r. 


According to the Neyman-Pearson lemma the region of rejection (22) 
has a greater chance of rejecting @=0, when 6 =@, is true than any other 
region which assigns probability a to the rejection of 6=6, (when 4, 
is the true value). Evidently the region (22) does not depend on the 
particular choice of alternative 62. The region (22) is “best” in the Ney- 
man-Pearson sense for any @.<6;. Hence (22) gives a uniformly most 
powerful test in the Neyman-Pearson sense of the hypothesis 6=8, 
against 0<4,. 

It is convenient in what follows to use acceptance rather than rejec- 
tion regions. Consequently the Neyman-Pearson theory tells us that a 
simple test for 6=6, against <6, with Type I error =a is given by an 
acceptance region of the form 


(22’)  - > Ax1-a?(2r)/2r. 


Let us now look at the operating characteristic curve of a procedure 
specified by (22’), i.e., let us study 
L(@) = Probability of accepting @=6, when @ is the true value 


(27) = Pr (6. > *1x1—2?(2r)/2r) 
= Pr (x2(2r) > @:x1-22(2r) /6) 


since 2r6,,,/0 is distributed as x2(2r) when 0 is the true value. The graph 





| given 


n (22) 
Other 
hen 6; 
yn the 
: Ney- 
most 

6=6; 


rejec- 
hat a 
by an 


LIFE TESTING 499 


of L(6) for various values of r and of the ratio @,/@ where a= .05 is given 
in Figure 1. 





°o 


PROBABILITY OF ACCEPTING ©*0, WHEN © IS THE TRUE VALUE 








Lie. 





@,/e 


Figure 1, Operating characteristice of tests of the form 
Gq >So 10.) = dea = .95 


In the problem just discussed it was assumed that r and a are known 
and C is unknown. We shall now consider a problem where both r and 
C are initially unknown. We want to choose these unknowns in such 
a way that the resulting operating characteristic curve will have the 
property that 


(28) L(i) =1—a@ and L(h) = 8B, 
where 6. <6, and a and £ are prescribed in advance. To meet condition 


(28) means substituting 62 for @ in (27) and requiring that r be such 
that 


1 
(29) a x1-a°(2r) 2 xs°(2r) or 2/61 S x1-«*(27)/xs°(2r). 


2 


Knowing (29) makes it an easy matter to find that integer r which en- 
sures that the operating characteristic curve pass most nearly through 
the points [0,, L(#:)=1—a] and [6:, L(62) =]. It can be verified that 
as r goes through the values 1, 2, 3, - - - , the ratio x:?-« (2r)/x,7(2r) is 
strictly increasing, and it is easy to show that it tends to unity. Conse- 
quently there is a smallest integer r such that 


(30) X1—a?(2r)/xe7(2r) = 2/61. 











500 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


TABLE II 


VALUES OF r AND ACCEPTANCE REGIONS FOR FIXED a, 8 WHERE 
a=PROBABILITY OF REJECTING @ WHEN @=6,; 8=PROBABILITY 
OF ACCEPTING 6 WHEN 0=6, AND WHERE @6,>%. 
ACCEPTANCE REGION IS OF FORM @,..>C 








= 


C/ A, C*/6, 

















6:,/ 0. r C/o C*/a || r C/a C*/a | r 
a= .0l 
B= .01 B= .05 B=.10 
3/2 136 .8114 .8068 || 101 .7831 .7794 || 83 .7625 .7620 
2 46 .6892 .6873 || 35 .6492 .6466 || 30 .6247 .6220 
5/2 27.6073 .6005 || 21 .5631 .5536 || 18 .5342 .5246 
3 19 .5445 .5365 || 15 .4985 .4864 || 13 .4692 .4559 
4 12 .4523 .4477 || 10 .4130 .3926 || 9 .3897 .3610 
5 9 .3897 .3867 8 .3633 .3287 || 7 .3329 .3009 
10 5 .2558 .2321 4 .2058 .1938 || 4 .2058 .1670 
a= .05 
p= .01 B= .05 B= .10 
3/2 95 .8374 .8360 || 67 .8079 .8059 || 55 .7890 .7841 
2 33 .7319 .7244 || 23 .6834 .6830 || 19 .6548 .6515 
5/2 19 .6548 .6438 || 14 .6046 .5905 || 11 .5608 .5602 
3 13 .5915 .5852 || 10 .5426 .5235 || 8 .4976 .4905 
4 9 .5217 .4834 7 .4694 .4230 || 6 .4355 .3864 
5 7 .4694 .4163 5 .3940 .3661 || 4 .3416 .3341 
10 4 .3416 .2511 3.2725 .2099 || 3 .2725 .1774 
a=.10 
p=.01 B= .05 p=.01 
3/2 77.8570 .8560 || 52 .8269 .8239 || 41 .8058 .8031 
2 26 .7583 .7559 || 18 .7123 .7084 || 15 .6866 .6710 
5/2 15 .6866 .6785 || 11 .6383 .6168 || 9 .6036  .5775 
3 11 .6383 .6104 8 .5820 .5478 || 6 .5253 .5153 
4 7 .5564 .5204 5 .4865 .4577 || 4 .4363 .4176 
5 5 .4865 .4642 4 .4363 .3877 || 3 .3673 .3548 
10 3.3673 .2802 2 .2660 .2372 || 2 .2660 .1945 

















For the acceptance region @r,n>C*, L(6:)2>1—a and L(6:) =8. 
For the acceptance region 6;,n >C, L(6:) =1—a@ and L(@:) SA. 
For any C’ such that C* <C’ <C, the acceptance region 6;,n >C’ has L(0:) >1—a@ and L(6:) <8. 


This is the value of r which we wish to use. If, with this value of r, we 
use an acceptance region 0 =8@, of the form, 


(31) rn > C where C = 0@xi-22(2r)/2r 








ER 1953 


HERE 
ILITY 


. 7620 
- 6220 
5246 
4559 
.3610 
3009 
. 1670 


. 7841 
6515 
. 5602 
.4905 
, 3864 
334] 
1774 


8031 
6710 
5775 
5153 
4176 
3548 
1945 





<B. 


r, we 





LIFE TESTING 501 


we shall have a test whose operating characteristic curve is such that 
L(6;)=1—a@ and L(6.) <8. [Incidentally, a region of acceptance for 
9=6, of the form 6,,,>C* where C* =6, x?(2r)/2r will give for the same 
ran operating curve such that L(6,) S$1—a and L(#) =8.] 

In summary, we have shown that given a, 8 and the ratio 6;/4 it is 
possible to find an r and C and a region of acceptance for 6=6, of the 
form 6;.n>C such that L(6;)=1—a and L(6.) <8. The computations 
for r<100 were made using a table of x?, particularly as extended by 
Catherine Thompson [8], and a Bureau of Standards compilation [2]. 
For r= 100, the Fisher form of the normal approximation to x? with 2r 
degrees of freedom was used in computing x1-2?(2r) and x,°(2r). For 
certain selected values of a and 8 the computations can be further sim- 
plified by using [3], and [4]. In Table 2 we give test procedures (i.e., 
regions of acceptance) for various selected values of a and @ and the 
ratio 6,/2. It is evident from this table how we obtained the numerical 
values for the special example given in the expository part of this paper. 

Remark: One can in a completely analogous way, find a uniformly 
most powerful test in the Neyman-Pearson sense for testing 0=6; 
against the one-sided class of alternatives @>06,. In this case the region 
of acceptance for @=6, is of the form 6;,.<K where K is such that 
Pr(6-.m <K| 6, true) =1l—a. 


4. Truncated Tests 


Mathematical details for truncated and sequential tests will be given 
in other publications. 


REFERENCES 


[1] Cramér, Harald, Mathematical Methods of Statistics. Princeton: Princeton 
University Press, 1946. 

[2] Deming, Lola S., “Some percentage points of the x2 distribution,” National 
Applied Mathematics Laboratory Report 51-2, August 1950, Statistical Engi- 
neering Laboratory, National Bureau of Standards. 

[3] Eisenhart, Churchill, “Planning and interpreting experiments for comparing 
two standard deviations,” Chapter 8 in Statistical Research Group, Colum- 
bia University, Techniques of Statistical Analysis. New York: McGraw-Hill 
Book Company, 1947, 267-318. 

[4] Ferris, C. D., Grubbs, F. E., and Weaver, C. L., “Operating characteristics 
for the common statistical tests of significance,” Annals of Mathematical 
Statistics, 17 (1946), 178-97. 

[5] Gumbel, E. J., “Les intervalles extrémes entre les émissions radio-actives,” 
Journal de Physique et le Radium, 8, ser 7 (1937), 321-29. 





502 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


[6] Halperin, Max, “Maximum likelihood estimation in truncated samples.” 
Annals of Mathematical Statistics, 23 (1952), 226-38. 

[7] Jacobson, P. H., “The relative power of three statistics for small sample 
destructive tests,” Journal of the American Statistical Association, 42 (1947) 
575-84. 

[8] Thompson, Catherine M., “Table of the percentage of the x* distribution,’ 
Biometrika, 32 (1941), 187-91. 

[9] Walsh, J. E., “Some estimates and tests based on the r smallest values in g 
sample,” Annals of Mathematical Statistics, 21 (1950), 386-97. 





[BER 1953 


samples,” [jf METHODS OF MEASURING USEFUL LIFE OF EQUIPMENT 
UNDER OPERATIONAL CONDITIONS* 


Il Sample 
2 (1947) Leo A. GoopMAN 
University of Chicago 
ibution,” 
: 1, INTRODUCTION AND SUMMARY 
Tues in s 1.1. Introduction 
1.2. Summary 
2, COMPARING Two Types oF EQuiIPMENT 
2.1. A Symmetric Type of Replacement Policy 
2.1.1. Definition of the symmetric replacement policy 
2.1.2. Population and sample composition 
2.1.3. Estimating relative longevities 
2.1.4. Making decisions and testing hypotheses concerning rela- 
tive longevities 
2.1.5. Estimating relative longevities from several inspections. . 
2.1.6. An advantage of the symmetric type of replacement policy. 
2.1.7. A logistical consideration 
2.1.8. Estimating absolute longevities 
2.1.9. Comparison of switch and 50-50 policies 
2.1.10. Length of the inspection interval 
2.1.11. Length of the replacement interval 
2.1.12. Continuous inspection and replacement 
. A General Type of Replacement Policy 
2.2.1. Definition of the type of replacement policy 
2.2.2. Estimating and testing hypotheses concerning relative 
longevities 
2.2.3. A logistical consideration and estimates of absolute longevi- 
516 
3. A NUMERICAL ILLUSTRATION 517 
4. CompaRING k Types oF EQuipMENT 518 
4.1. Definition of the Type of Replacement Policy and Estimates of 
Relative Longevities 518 
4.2. A Logistical Consideration and Estimates of Absolute Longevities. 520 
5. THE NONPARAMETRIC NATURE OF THE AsymPpToTIC RESULTS 521 


APPENDIX 


Al. The Difference Equation 
. The Mean Square Error 
. A Theorem on Binomial Variates 
. An Impossibility Theorem 
. The Necessary Supply of Replacements 
. Comparison of the Switch and 50-50 Policies 
. An Advantage to Replacement at Each Inspection 
. The Differential Equation 
REFERENCES 





* Based on research supported by the Office of Naval Research at the Statistical Research Center, 
University of Chicago. I wish to thank Merrill M. Flood, Columbia University, for bringing this problem 
to my attention. 


503 





504 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


The problem discussed is that of comparing the longevities 
of two or more types of equipment under operational condi. 
tions where it is not convenient to identify or keep records of 
individual items. Such a comparison can be made by adopting 
certain replacement policies and observing their effect on the 
composition of the population. Methods of estimating relative 
and absolute longevities are given for the case where k types 
of equipment are being compared and various logistical re- 
quirements are placed upon the replacement policies. Methods 
of making decisions and testing hypotheses concerning the rel- 
ative and absolute longevities are also given. Replacement 
policies are given which, under certain conditions, are opti- 
mum for purposes of studying longevity. 


1. INTRODUCTION AND SUMMARY 


1.1. Introduction 


COMPARISON of the longevities of two or more types of equipment 
A under operational conditions where it is not convenient to identify 
or keep records of individual items can be made by adopting a certain 
replacement policy and observing its effect on the composition of the 
population. When only two types are being compared, for example, the 
policy might be that when an item fails it will be replaced by one of the 
opposite type. Then the composition of the population at any time 


(that is, the proportions of the different types among all the items in 
use) will depend upon the original composition of the population, the 
time elapsed, and the longevities of the different types. Since the orig- 
inal composition and the elapsed time are known, by determining the 
new composition of the population we can obtain information concern- 
ing the longevities of the different types of equipment. 


1.2. Summary 


The problem of comparing two types of equipment is analyzed in 
detail when it is assumed that the equipment is subject to a constant 
risk. Later we see that if a replacement policy is used for a long period 
of time, the results obtained under the assumption of a constant risk 
remain valid even when the risk is not constant. A type of replacement 
policy is investigated for which approximately equal numbers of units 
of each type of equipment are used as replacements. An example of 
such a replacement policy is: Half of the replacements will be of one 
type of equipment, and half of the other. This “50-50” policy is de- 
scribed by J. L. Glathart and F. W. Preston [4] and attributed to 
Merrill M. Flood. Another replacement policy of this type is: When an 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 505 


item fails, replace it by one of the opposite type (“switch”). We shall 
see that this “switch” policy has certain advantages over all other 
policies of the type investigated. Under certain conditions, for example, 
the estimates of the relative longevities when using a switch policy are 
less biased and have a smaller mean square error than the estimates 
obtained under other policies of the kind investigated. Correspond- 
ingly, the power of tests of various hypotheses concerning the relative 
longevities is greater when the switch policy is used. 

Types of replacement policies which satisfy different logistical re- 
quirements are also investigated. For example, it might be necessary 
to use as replacements only half as many items of one type as of the 
other. Methods of estimating and testing hypotheses concerning the 
relative longevities are given which may be adopted when these re- 
placement policies are in use. 

If the replacement policies have been used for a long time, only 
estimates of population composition are needed for estimating or for 
testing hypotheses concerning relative longevities. If information 
about the stock is also available (i.e., knowledge of the total number 
of items that have been replaced), we can estimate the absolute longevi- 
ties of the individual types of equipment; and correspondingly, hy- 
potheses concerning the absolute longevities can be tested. A numerical 
illustration is presented. 

Methods of estimating relative and absolute longevities are given 
also for the case where we wish to compare k types of equipment and 
where various logistical requirements are placed upon the replacement 
rules. 

The work presented herein may be considered a special application 
of renewal theory and the theory of Markov chains. An excellent exposi- 
tion of these theories is given by Feller in [3], Chapters 12, 13, and 15. 


2. COMPARING TWO TYPES OF EQUIPMENT 
2.1. A Symmetric Type of Replacement Policy 


2.1.1. Definition of the symmetric replacement policy. Consider the 
following type of replacement policy: When an item fails, the probabil- 
ity is p (assumed to be greater than zero) that its replacement will be 
of the opposite type. That is, about 100p% of the replacements will be 
of the type different from the item that failed and about 100(1—p)% of 
the replacements will be of the same type as the failure. For example, 
when p=1 we have a “switch” policy, and when p=1/2 we have a 
“50-50” policy. We shall first consider the case where inspections are 
made at periodic intervals, at which time the items found to have 





506 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


failed are replaced. In Section 2.1.12 the case of instantaneous replace. 
ment upon failure is considered. 

2.1.2. Population and sample composition. Let f; be the probability 
(assumed to be greater than zero) that an item of type 7 (¢=1 or 2), 
which had not failed at time t, (say, the third week), will have failed 
by the next time & it is inspected (say, the fourth week). We write 
1—f;=<s; for the probability that an item of type 7 will survive the 
entire period between inspections. Then the probability is s;*~' f; that 
an item of type 7 will be found on the zth inspection to have failed 
since the (x—1)th inspection. The mean length of life for items of 
type 7 is 


(1) > asi7—' f; = fi/(1 — si)? = 1/f; = Li. 


That is, the length of life of items of type 7 has a negative binomial 
distribution where the longevity is equal to 1/f;=L;,. If a replacement 
policy is used for a long period of time, the results obtained when the 
length of life has a negative binomial distribution with mean J; will 
remain valid also in cases where the distribution of length of life has 
mean L; but is not negative binomial (see Section 5). 

Consider the effect of adopting the symmetric replacement rule de- 
fined in Section 2.1.1. upon the population composition. For simplicity, 
we consider the case where the population was initially composed of N 
items of each type of equipment. If an item is drawn at random from 
the 2N items of which the population is composed at the zth inspec- 
tion, the probability that it is of type 1 is 


I, — Li 


Lo =] [1 — p(fi + fr) F*. 


1 
(2) Pra(1j p) = Li/(Li + Ls) + s| 


If 2n items are drawn at random from the 2N items of which the popv- 
lation is composed at the zth inspection, then the number 7, of items 
of type 1 has a binomial distribution with parameters 2n and Pr, { 1|}. 
That is, the chance that n; items in the sample will be of type 1 is 
Cn.2"[Prz(1| p) ]™[Pr.(2| p) 2", where C.¥=y!/[z!(y—z)!]. Hence the 
expected value of the proportion n;/2n of items of type 1 among the 
sample of 2n items from the population composition at the zth in- 
spection is 


(3) E.(m/2n| p) = Pr.(1| p). 


The number of items of type 1 among the 2N items of which the popu- 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 507 


lation is composed at the zth inspection has a binomial distribution 
with parameters 2N and Pr,(1|p). (See Appendix Al. for proofs of 
statements presented in this section.) 

2.1.3. Estimating relative longevities. By simply observing the com- 
position of a sample of 2n items which are in use at the zth inspection 
we can obtain an estimate of the relative longevities of the types of 
equipment. That is, the proportion n;/2n of items of type 1 is an es- 
timate of the relative longevity L:/(LZ1+ 2). The bias of the estimate 
n,/2n of the relative longevity is 


B(p) = E.(m/2n| p) — Li/(Li + Le) 


(4) lf - 7 

=a] b- ent ar, 
which becomes negligible when the replacement policy is in use for 
a long period of time. We shall deal with the case where fi+f2$1. In 
this case the absolute bias | B(p)| is a monotonically decreasing func- 
tion of p, and, therefore, is minimized when p=1. That is, among all 
replacement policies of the symmetric type investigated, the switch 
policy leads to estimates which are least biased. Also, the bias ap- 
proaches zero most rapidly when the estimate of the relative longevity 
is made from data resulting from the switch policy 

The mean square error of the estimate n;/2n is 


E,{ [n/2n — Ly/(Li + L2)]?| p} 
= Pr,(1| p)Pr.(2| p)/2n + B%(p). 


(5) 


Among all replacement policies of the type investigated, the switch 
policy leads to estimates which have the smallest mean square error 
(See Appendix A2). Also, the mean square error approaches LL, 
/[(Li+JLs2)? 2n] most rapidly when the estimate of relative longevity 
is made from data resulting from the switch policy. 

Since the mean square error is approximately 


Ly Ty 
a | en 
(I, + Lz) I, + In 
when the replacement policy has been used for a long period of time, 
the mean square error can also be estimated from the sample composi- 
tion. An estimate of the mean square error is 
n1 


nN ( 
—{1 — —)/2n = nn,/8n'. 
2n 2n / iMa/ 





508 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMRER 19 


An estimate of the square root of the mean square error is 


(6) Ls J/nyn2/2n. 
2n 


2.1.4. Making decisions and testing hypotheses concerning relatiye 
longevities. Consider the problem of choosing the type of equipment 
which has the greater longevity. A reasonable symmetric procedure 
would be to choose type 1 if np Sd and to choose type 2 if n1 Sd, where 
d<n is a constant to be determined by the amount of indecision we 
are willing to permit. (When d=n—1 there will be indecision only when 
the number of type 1 items exactly equals the number of type 2 items 
in the sample of 2n items from the population composition at the zth 
inspection.) The probability of making an incorrect decision, say, of § 
choosing type 2 when in fact type 1 has the greater longevity is 


(7) P= Pram Sd) = YO Cy,*[Pre(1| p)]"[Pra(2| p)]"-™, 


ny=d 


where Pr,(1 | p)>4. (If L1>Le, then Pr,(1 | p)>% when x>0.) A proof 
of the fact that P is a decreasing function of Pr,(1|p) is given in Ap- 
pendix A3. Also Pr,(1 | p) is an increasing function of p (see Appendix 
Al). Whence, P is a decreasing function of p and is minimized when 
p=1. That is, among all replacement policies of the type investigated, 
the switch policy minimizes the probability of making an incorrect 
decision as to which type of equipment has the greater longevity. The 
switch policy also maximizes the probability of making a correct choice 
of equipment. Also, P approaches 


d 
(8) Do Cn?" Li" Ly?—™/(Ly + Ln) 


n,=0 


most rapidly when the switch policy is used. 

The switch policy also has similar optimum properties in the case 
where, say, the hypothesis H that L;=Lz is to be tested at a level of 
significance a against the alternative that L;> Lz. Then d is the smallest 
integer such that 


d 
2 C..™ Q-2n = 1— a, 


n,=0 « 


and H is accepted when n; Xd and rejected otherwise. 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 
When 27 is large, then 
(9) [nm — 2nPr.(1| p)]/W2nPr.(1| p)Pr.(2| p) = y 


is approximately normally distributed with zero mean and unit vari- 
ance, and, hence 





P = Pr,(m s d) 
~ Pr{y s [d — 2nPr.(1| p)]/V/2nPr.(1| p)Pr:(2| p)}. 





(10) 


This formula can be used by the experimenter to determine the sample 
size 2n and the number z of inspections which will be necessary in 
order to guarantee that the chance of an incorrect choice of equipment 
(or an error of the second kind) will be preassigned amount P. 

The test discussed herein for the hypothesis that Z1=L:2 against 
the alternative that L,>L,2 may be considered a test of the hypothesis 
that the proportion of type 1 items among the 2N items of which the 
population is composed at the zth inspection is } against the alternate 
hypothesis that this proportion is greater than }. In other words, the 
test discussed herein is essentially a test of the mean of a binomial 
distribution. Sequential methods have been developed for testing the 
mean of a binomial distribution (see [16], Chapter 5). A detailed and 
nonmathematical discussion of this problem, together with a number 
of tables, charts, and computational simplifications, is contained in 
[5]. If the population is composed of many items (2N is large) and the 
replacement policy has been in use for a long period of time (z is large), 
the sequential method can be directly used to test hypotheses concern- 
ing the relative longevities. For example, if the null hypothesis is that 
L,/(Li+L2) =.1 and the alternate hypothesis is that L;/(Li+L2) =.3, 
the acceptance and rejection numbers which describe the sequential 
test are given in Table 5 of [6], p. 93 when the desired probabilities of 
making errors of type I and II are a=.02 and B=.03 respectively 
(see [6], p. 90). The number of items of type 1 observed (number of 
defects observed) is recorded as in Table 5 of [6] (or graphed as in 
Fig. 11 of [6], p. 94) until the procedure leads to acceptance or rejection 
of the null hypothesis. 

2.1.5. Estimating relative longevities from several inspections. In the 
preceding sections the only information used in estimating relative 
longevities was obtained by studying the composition of a sample of 
2n items from the population of 2N items on the zth inspection. In this 
section, information obtained by studying the composition of samples 
of 2n items from the population on zth, 2 zth, 3 zth, ---, & zth in- 





510 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


spection will be used to estimate the relative longevities. A simple 
estimate of the relative longevity of type 1 items would be the propor. 
tion n,(k)/2nk of type 1 items among the 2nk items observed in the 
k samples; that is, ni(k) = > }.1n1; where mi; is the number of type | 
items observed at the ixth inspection. 

The bias B(p) of this estimate is minimized when the switch policy 
is used. When z is large (and/or when n/N is small) we can assume that 
the dependence between the k samples is negligible and the mean square 
error of the estimate can be determined approximately from the usual 
formula for the mean square error of a sum. Whence, the mean square 
error of the estimate is about > Prje(1| p) Pr je(2| p) /2nk?+ Br ). 
When the switch policy is used the mean square error approaches 
L;L2/ [(Li+L:2)? 2nk] more rapidly than other policies of the type 
investigated. 

2.1.6. An advantage of the symmetric type of replacement policy. One 
of the advantages of replacement policies of the type investigated is 
that they have the effect of keeping in use on the average more items 
of the type with the greater longevity. That is, when L,/(Li+J,) >}, 
the average number of items of type 1 in use at any inspection time 
x is 


(11) 2N Pr.(1| p) > N. 


This average number approaches 2NI;/(Zi+Z:2) when the replace- 
ment rule is in use for a long period of time. Since 2N Pr,(1| p) isa 
monotone function of p, the switch policy has the effect of keeping in 
use on the average more of the type with the greater longevity than 
the other policies of the type investigated. This effect occurs immedi- § 
ately (even at the first inspection) and the effect continues to increase 
with the length of time the replacement rule is in use. Hence, even if 
the sample composition is not observed in order to choose which type 
has the greater longevity, the use of the replacement policy insures us 
that more of the type with the greater longevity will be in use on the 
average. 

Aside from logistical considerations, it would seem more desirable to 
have a replacement policy which would lead asymptotically to ex- 
clusive usage of the type with the greater longevity. Replacement 
policies having this desirable property could be devised. However, 
these policies make reference to past results (information concerning 
failures at the preceding inspections). In Appendix A4 it is proved 
that if only “simple” replacement policies which are easy to apply 
under operational conditions are considered, then it is impossible to 





3ER 1953 


simple 
ITOpPor- 
in the 
type | 


policy 
ne that 
square 
e usual 
square 
-B'(p), 
oaches 
e type 


y. One 
ated is 
» items 
La) >}, 
n time 


»place- 
pD) isa 
ing in 
y than 
medi- 
crease 
ven if 
1 type 
Tes US 
on the 


ble to 
to ex- 
ement 
wever, 
arning 
roved 
apply 
ble to 


LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 511 


devise a policy which would lead asymptotically to exclusive usage of 
the type with the greater longevity. By a “simple” replacement policy 
we mean a policy which determines the number of replacements 
of each type using at most the following two facts: (1) the population 
composition after the last inspection, and (2) the number of failures of 
each type since the last inspection. 

2.1.7. A logistical consideration. Suppose logistical conditions are 
such that about the same number of each type of equipment is avail- 
able. It would then be desirable to adopt replacement policies which 
use about the same number of each type of equipment as replacements. 
If a symmetric type of replacement policy is applied, about the same 
number of each type of equipment will be used as replacements (see 
Appendix A5). 

We shall see in Section 4.2 that the replacement policies of the 
symmetric type are the only replacement policies among a more general 
type of policy which have this property of equal usage of replacements. 
When the switch policy is adopted, fewer replacements will be needed 
on the average (see Appendix A5). 

2.1.8. Estimating absolute longevities. In the preceding sections the 
only information used in estimating and testing hypotheses was ob- 
tained by observing the composition of a sample (or samples) of items. 
In this section we consider the case where information concerning stock 
(that is, the number of items that have been replaced since the first 
inspection) is also available. Using this added knowledge, the absolute 
longevities of the types of equipment can be estimated. 

Applying the results of Appendix A5, the average of the number R, 
of replacements ued at the zth inspection is 2N Pr,(r| p) (see equation 
(57) ) and the average of the number > -j7_, R;=U, of replacements 
used since the first inspection is 


B(U.| p) = Y (B| ») = 2N Y Pri(r| 


j=l 


=> — a Lz) 


N ~ (1 — [fl -—pfitsh)l (CaS ao). 


Hence, U,/4Nz is a biased estimate of 1/(Z:+Z2), but the bias 
approaches zero when the replacement policy is used for a long period 


of time. The bias approaches zero most rapidly for the switch policy. 
Since the variance of U,/4Nx will approach zero as x becomes large, 





512 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


U,/4Nz converges in probability to 1/(L:+L2) and 4N2x/U,z converges 
in probability to L,+L¢ (see [2], p. 255). 

If both the sample composition and information concerning stock js 
available, the absolute longevity, say, Z; can be estimated using 


(13) 2Nam/U.zn. 


From Sections 2.1.2 and 2.1.3 we see that the limiting distribution of 
n; is a binomial distribution with parameters 2n and L;/(Zi+Le) when 
x approaches infinity. Hence the expected value of the limiting dis. 
tribution of 2Nan,/U.n is (Li+L:2)[Li/(Li+L2)]=L1; that is, the 
bias of the limiting distribution of the estimate is nil. The square root 
of the mean square error of this estimate of absolute longevity can be 
estimated by 


(14 2N: — 
) ( U ) VJ 1N2/ nN. 


nUsz 


Using Slutsky’s theorem (see [2], p. 255) the estimate is found to be 
consistent when 7 is also large. 

It should be pointed out that if information concerning stock is 
available and no logistical restrictions are imposed, replacement policies 
other than those investigated in the preceding sections might be better. 
For example, consider the replacement policy that an item which 
fails will be replaced by the same type. Clearly, more items of the 
type with the shorter longevity will be used on the average, contrary 
to the logistical consideration pointed out for the symmetric policies in 
Section 2.2.7. Hence the logistics must be such as to make it possible 
to supply any proportion of each type. Considering the history of a 
single item of type 1 in the initial population, we see that the chance 
that a replacement will be needed at the zth inspection is f,. The 
average of the number V(1) of replacements of type 1 used since the 
first inspection is Naf;. Hence, V(1)/Nz is an unbiased estimate of fi. 
Also, since the variance of V(1)/N2z approaches zero as the replacement 
rule is used for a long period of time, V(1)/Nzx converges in probability 
to f; and Nx/V(1) converges in probability to Z;. Hence, when each 
item which fails is replaced by its own type, consistent estimates of 
longevities are obtained by using only the information concerning 
stock. If information concerning stock is not available, then it is ob- 
vious that this replacement policy is uselsss when relative longevities 
are to be calculated from population or sample composition because 
the population composition remains the same. 

2.1.9. Comparison of switch and 50-50 policies. The 50-50 policy 





yn of 
vhen 
dis- 
the 
root 
n be 


LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 513 


has the advantage that it is not necessary to observe the type of an item 
which has failed in order to replace it, since the chance that a replace- 
ment will be of type 1 is the same whether the failing item is of type 1 
or of type 2. We shall now make some direct comparisons between the 
50-50 policy and the switch policy. 

In the preceding sections the switch policy was found to have vari- 
ous optimum properties. It should be pointed out, however, that al- 
though the switch policy is “best” in various senses, it is only “slightly 
better” than the other replacement policies of the type investigated. 
For example, the difference in bias B(p) of the estimate n,/2n when 
the 50-50 policy is adopted rather than the switch policy is 
1rfl, - 

B(1/2) — B(1) = aRewal [1 — (fi + fo)/2]* 


(15) 2LLi + Lp 
—[1-(ht+h)]*}. 


This difference is small when the policies are used for a long period of 
time. The ratio B(})/B(1), however, approaches infinity when the 
policies are used for a long period of time. 

Aithough the switch policy is only “slightly better” in various senses 
than 50-50 policy, to reduce the bias of the estimates of the relative 
longevities to any specified amount takes at least twice as many inspec- 
tions under the 50-50 policy as under the switch policy (see Appendix 
A6). Also to reduce the mean square error of the estimates to any speci- 
fied amount takes at least twice as many inspections under the 50-50 
policy as under the switch policy. However, the difference between the 
mean square errors of the estimates is small when the policies are used 
for a long period of time. 

2.1.10. Length of the inspection interval. In the preceding sections it 
has been assumed that the chance is f; that an item of type 7 will fail 
in the time interval between two successive inspections if it had not 
failed at the earlier inspection. A small change d in the time interval 
between successive inspections might have the following effect: The 
probability is f;(1+d) that an item of type 7 which had not failed by 
the zth inspection will fail by the (z+1)th inspection, when the time 
interval between inspections has been changed by a small amount d. 
In that case the bias B(p) of using n;/2n as an estimate of L; is 


apes 
2LZ, + Ly 


When p(f:+f2)(1+d) $1, the bias is a decreasing function of d and 
hence will be minimized when d is maximized. Therefore, when z inspec- 


(16) 


] [1 — p(fi + fo)(i + d)]-. 





514 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


tions are to be made, the bias and the mean square error will be mini- 
mized, and the power of the tests will be maximized when the time 
between inspections is the largest interval which will change the prob- 
ability of a failure by a multiplicative factor. This, of course, will in- 
crease the total time which will elapse before estimates are made. 

2.1.11. Length of the replacement interval. Suppose the replacement 
policies are modified so that replacements are to be made if necessary 
only on inspections z, 2z, 3z, - - -. That is, even if items fail immedi- 
ately after inspection jz, they will not be replaced until inspection 
(j+1)z. When z>1, this method has the advantage of requiring fewer 
replacement inspections in a given time period. It has the disadvan- 
tage of permitting the total number of items in operation to vary 
somewhat. If the replacement policies are used for a long period of 
time, the bias and mean square error of the estimates of relative 
longevities will be minimized and the power of tests concerning relative 
longevities will be maximized when items which have failed are re- 
placed at the next inspection; that is, when z=1 (see Appendix A7). 

2.1.12. Continuous inspection and replacement. In the preceding sec- 
tions we consider the case where inspections are made at periodic inter- 
vals at which time the items which failed are replaced. Most of the 
results obtained in the preceding sections for the case of periodic in- 
spections will hold even in the case of instantaneous replacement upon 
failure since the basic difference equation obtained in the former case 
(see Appendix A1) is analogous to the differential equation obtained 
in the latter case (see Appendix A8). 


2.2. A General Type of Replacement Policy 


2.2.1. Definition of the type of replacement policy. Consider the fol- 
lowing type of replacement policy. When a type 1 item fails, the chance 
is p that its replacement will be of type 2, and when a type 2 item fails, 
the chance is Mp that its replacement will be of the type 1. Without any 
essential loss of generality, we can assume that M21 and 0<pS1/M. 
When M =1 we have the symmetric type of replacement policy inves- 
tigated in Section 2.1. 

By reasoning similar to that in Appendix A1, the probability that a 
descendant of an item in the initial population will be of type 1 at the 
xth inspection may be represented by 


Pr.(1| p) = Prea(1| p)(1 — frp — faMp) + foMp 
= [,;M/(L,M + Lz) 

17L. — L,M 

slim + i 


(17) 


| [1 — p(fi + feM) |. 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 515 


If fit+f2M SM, the absolute value of the second term in the sum is 
minimized when p attains its maximum value 1/M. Hence, the “switch 
(M)” policy (that is, when a type 2 item fails, it will be replaced by an 
item of type 1, and when a type 1 item fails, the chance is 1/M that 
its replacement will be of type 2) will have some optimum properties 
similar to those proved for the switch policy. 

2.2.2. Estimating and testing hypotheses concerning relative longevities. 
Suppose a sample of 2n items is drawn at random from the 2N items 
of which the population is composed at the zth inspection. Writing n; 
as the number of items of type 7 observed in the sample, the expected 
value of n;/2nM is Pr, (1| p) /M. If the repiacement policy has been 
in use for a long period of time, the bias of n:;/2nM as an estimate of 
L,/(L:1M+JL:) approaches zero most rapidly when the switch (M) 
policy is adopted. Also, the mean square error of the estimate is small- 
est in that case. When n is also large, the mean square error of the 
estimate approaches zero and hence n:/2nM converges in probability 
to L;/(L;M+L:). Also, 1:1/n2M is a consistent estimate of L/L», 
and n;/{ni;+n2M) is a consistent estimate of L;/(Li+L2). 

Suppose samples of 2n items are drawn at random from the popula- 
tion on the zth, 2 zth, 3 zth, - - - , & zth inspection. Writing n,(k) as 
the total number of items of type 7 among the 2nk items observed in the 
k samples, the statistics ni(k)/2nkM, ni(k)/no(k)M, and ni(k)/[ni(k) 
+n(k)M] are consistent estimates of L;/(Li1M+IL:2), Li/L2, and Li 
/(Li+L2), respectively, when kn is large and the replacement policy 
has been in use for a long period of time. 

A reasonable symmetric procedure for choosing which type of equip- 
ment has the greater longevity is the following: Choose type 1 if 
mM /n;Sc and choose type 2 if n:/n.M Sc, where c<1 is a constant 
to be determined by the amount of indecision we are willing to permit. 
The probability of making an incorrect decision, say, of choosing type 
2 when in fact type 1 has the greater longevity is 


cM2n a] 
1+cM a“ 


P = Pr,[n, S cM(2n — m)| = Pr. E S 
(18) . 
DX Cx2"[Pre(1| p)]"[Pr.(2| p)]*-™, 


n,=0 . 
which may be computed directly or approximated by the normal ap- 
proximation to binomial variates. 
Sequential and nonsequential methods analogous to those pre- 
sented in Section 2.1.4, may be used to test hypotheses concerning 
relative longevities. 





516 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


2.2.3. A logistical consideration and estimates of absolute longevities, 
The symmetric type of replacement policy has the property that 
about an equal number of items of each type will be used as replace- 
ments (see Section 2.1.7). Let us now consider the case where logistical 
conditions are such that about M times as many items of type 1 may 
be used for replacements as items of type 2. This logistical restriction 
will be satisfied if a replacement policy of the type defined in Section 
2.2.1 is adopted (that is, if the chance that a type 2 item will be re- 
placed by its opposite type is M times the chance that a type 1 item 
will be replaced by its opposite type). We shall see in Section 4.2 that 
replacement policies of this type are the only policies among a more 
general type of policy which satisfy this logistic restriction. When the 
switch (M) policy is adopted fewer replacements will be needed on the 
average. 

The formulas presented in Appendix A5 for the symmetric type of 
replacement policy (M=1) may be generalized to the type of replace- 
ment policy investigated in this section. We obtain the following re- 
sults: Considering the history of a single item in the initial population, 
the probability that a replacement of type 1 was used at the zth in- 
spection is 


Pri(r = 1| p) = Prea(1| p)fa(l — p) + Prea(2| p)foMp 
(19) = M/(LiM + La) + { [fl — p) + fMp)/2 
— M/(LiM + L:)}[1 — p(fi+f2M))>". 
When the replacement policy is in use for a long period of time, we have 
(20) Pr.{r =1|p} = Pra{r = 2! p}M = M/[LiM + Ly]. 


That is, about M times as many type 1 items will be used for replace- 
ments of type 2 items. The probability that a replacement will be 
needed is 


Pr.(r| p) = Prr = 1 | p) + Pr.(r = 2| p) 
= (M + 1)/(LiM + Ly) 
+ 43[1 — p(fi t+ feM) |? [(M? — 1)L? 
+ (Li — L2)?|/LiL2(IiM + I). 


(21) 


If information concerning stock is available (that is, if the number 
U, of replacements used since the first inspection is known), we can 
estimate absolute longevities. By an approach similar to that of Section 
2.1.8 we find that U,/2N(M-+1)z converges in probability to 1/(1,M 
+L2) and 2N(M+1)z/U. converges in probability to (1,M+L:) 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 517 


when x becomes large. Hence, N(M-+1)xn,/U.Mn can be used to 
estimate the absolute longevity L,. The bias of the limiting distribution 
of this estimate is nil when the replacement rule is used for a long period 
of time, and the estimate is consistent when n becomes large. 


3. A NUMERICAL ILLUSTRATION 


We have attempted to reproduce a situation similar to the adoption 
of replacement policies in a restaurant [1] where thirty tumblers are 
utilized and the replacement policy is used for ten weeks (x=10). A 
small number (2N =30) of tumblers were used in order to simplify 
the simulation. The restaurant situation was simulated by use of a 
table of random numbers. The chance f; of a failure by the zth week 
for a type 1 tumbler which had not failed by week x—1 was taken as 
.50 and f2 as .25; thus the relative longevity of type 1 tumbler was 
fo/(fi te) =4. The initial population consisted of 15 tumblers of each 
type and the 50-50 policy was adopted. At the end of the tenth week 
there were 14 type 1 tumblers in use. An estimate of the relative longev- 
ity of type 1 tumblers based only on the composition of the population 
at the end of the tenth week would be 14/30 =.467 (see Section 2.1.3). 
The expected value of the estimates is 


Prio(1| 3) = 3 + (4)(8)[1 — 3(.75)]*° = .335 [see equation (2)]. 


The square root of the mean square error is estimated by 
V (.467)(.533)/30=.091 [see equation (6)]. From equation (5) we 
fnd that the square root of the mean square error is in fact 
Vv (.835)(.665)/30+ (.002)?=.086. Suppose the type of tumbler which 
appears most frequently in the sample is chosen as the type with the 
greater longevity. We would have correctly chosen the type 2 tumbler 
on the basis of sample composition. The probability of making the 
incorrect decision of choosing type 1 is computed from equation (7) 
and found to be 


14 
(22) P = > C°(.665)i(.335)3*-i = .026 


j=0 








The simulation of the restaurant situation was repeated using the 
switch policy for making replacements. All other conditions were the 
same as in the preceding simulation (that is, f;=.50, f2=.25, and the 
initial population consisted of 15 tumblers of each type). At the end 
of the tenth week there were 11 type 1 tumblers in use. An estimate 
of the relative longevity is 11/30=.367. The expected value of this 
estimate is 


(23) Prio(1| 1) = 3 + (4)(4)(1 — .75)!° = .333 








518 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


[see equation (2)]. The square root of the mean square error is es- 
timated by +/(.367)(.433)/30 = .088 [see equation (6)]. From equation 
(5) we find that the square root of the mean square error is in fact 


(24) V (.333)(.667)/30 = .086. 


Suppose the type of tumbler which appears most frequently in the 
sample is chosen as the type with the greater longevity. We would have 
correctly chosen the type 2 tumbler on the basis of sample composition. 
The probability of making the incorrect decision of choosing type 1 may 
be computed from equation (7) and found to be 








14 
(25) P = >> C°(.667)(.333)**-i = .025. 

j=0 

We also found that 88 tumblers were needed as replacements during 

the 10 weeks the switch policy was used. With this added information 
estimates of the absolute longevities, which in fact were L;=1/.50=2 
and L,.=1/.25=4, can be obtained. Using equation (13) the estimates 
are 2(10)(11)/88 = (.367)600/88=2.5 and (.633)600/88 =4.3, respec- 
tively. The square root of the mean square error (the standard error) 
of these estimates is estimated as approximately (.088)600/88 = .6 [see 
equation (14) ]. In this numerical illustration we have found that the 
estimates of the relative and absolute longevities were all within one 
standard error of their true values when the switch policy was used. 


4. COMPARING k TYPES OF EQUIPMENT 
4.1. Definition of the Type of Replacement Policy and Estimates of Rela- 
tive Longevities 

Consider the following type of replacement policy: When an item of 
type 7 fails, the probability is p;; that its replacement will be of type 
j (i, 7=1, 2,-++, k and >-%_,p;;=1). Then the probability that the 
descendant of an item in the initial population will be of type u at the 
zth inspection is 


(26) Pr.(u) = Przi(u)su + > PriAlt)fipiu 


or 


k 
(27) Pr.(u) — Prea(u) + Prea(ufu = Do Prea(ifipin, 


t=1 


where f; is the probability that an item of type 7 which had not failed 





1953 


the 
Ave 
on. 
nay 


‘ing 
‘ion 
ites 
yeC- 
ror) 
[see 
the 
one 


ela- 


a of 
ype 
the 
the 


iled 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 519 


by the tth inspection will fai! by inspection t+1, and s;=1—fj. 

This type of replacement policy clearly does not include all possible 
replacement policies. A policy which is not in this class is: When an 
item fails replace it by one of the type opposite that of the preceding 
replacement. The first replacement to the population will be of type 1 
with probability 4. This policy differs from those investigated herein 
in that the type of replacement is determined by the type of the pre- 
ceding replacement rather than by the type of item which failed. This 
policy is similar to the 50-50 policy since about 50% of the replace- 
ments will be of the type different from the item which failed. Also, 
when inspections are made at discrete inspection times, about equal 
numbers of each type of equipment will be used as replacements if 
either this policy or the 50-50 policy is adopted. For practical purposes 
the analysis presented herein of the 50-50 policy may be used as an 
approximation to the analysis of the other replacement policy. 

We shall consider the case where the p;; and f; are such that Pr.(u) 
has a nonzero limit for u=1, 2,---, k, when x becomes large (see 
[3], p. 325). Hence, 


k 

(28) Pra(u)fu = Pr(u)fu = > Pr(i)fipiu, 

for u=1, 2, ---, k, and the distribution Pr(u) is uniquely determined 
by this system of equations. Hence, Pr(u)=L./>_%_,Li, if and only 
if 7} piw=1 for u=, 1,2, ---, k. For k=2, this implies that pirt+pe 
=Pitpe OF Pi=P2=p. In other words, the symmetrical type of 
replacement policy is the only type of policy among those investigated 
herein which will in the long run make the average population composi- 
tion of the various types proportional to their longevities. 

We also have that Pr(u)=M.L./ > *.,MiLi, if and only if 

tM pin = M, for u=1, 2, - - +, k. For k=2, we see that the type of 
replacement policy studied in Section 2.2 is the only type among those 
described herein which will in the long run make the average number of 
type 1 items in the population ML,/L: times as great as the number 
of type 2 items. 

For any given replacement policy which has been in use for a long 
period of time, estimates of the relative longevity L./>. %,L; may be 
obtained which are based only on the composition of an observed sam- 
ple. That is, letting p;; describe the given replacement policy and n, be 
the number of items of type u observed in a sample of n items, we first 
solve the following system of k linear equations for M,, 








520 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
k 
(29) M. = D Moin. 
i=] 
Then n,/nM, and n,M,-1/>-%_.n:M-— are consistent estimates of 
Lu />-*_,MiL; and L,/>-*,L;, respectively, when n is large. 
4.2. A Logistical Consideration and Estimates of Absolute Longevities 


The chance that a replacement of type u will be needed at the zth 
inspection as a descendant for an item in the initial population is 


(30) Pr,(r = u) = : Pri-1(t)f piu. 


Hence, when the replacement rule has been in use for a long period of 
time, 


(31) Pr.(r = u) = Pr(r = u) = > Pr(i)fipin = Pr(u)fu. 
Therefore 
(32) Pr(r = u) = M,/ > Mili, 


where the values of M, are determined by the system of equations, 


k 
(33) M. = > Moin foru = 1,2,---,k. 


t=] 


For k=2, we see that the type of replacement policy investigated in 
Section 2.2 is the only policy among those described herein which will 
in the long run use for replacements M times as many type | items as 
items of type 2. 

If the proportions M,/ >-%_,M; of the total replacements which are 
of type wu are given by logistical considerations, then the type of replace- 
ment policy which will satisfy this supply restriction may be deter- 
mined by the {7;;} satisfying the system of equations 


k 
M, = >> Mvpiu, 


i=l 


k 
l= Z Pui; 


t=] 


(34) 


for u=1, 2,---, k. For example, if it is desired that the pi, be inde- 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 521 


pendent of 7 (that is, pi1=pu, where the type of failing item need not 
be observed), p, must be equal to 


k 
(35) M./ 2) Mi. 
i=] 

Since )-¥_, Pr(r=u)= >%_, M./ >t, Mili, the average of the 
number U, of replacements used since the first inspection will be ap- 
proximately 

k k 

(36) (aw p> M.) / > Mili, 

u=l i=l 
where the population consists of N items. We find that U./(Nxz >U*., 
M,) converges in probability to 1/ )-?_, M:L; and (Nz >-*_, M,)/U. 
converges in probability to }-}_, M@.L;, when z becomes large. Hence, if 
information concerning stock is available, then n.Nz (>_t.,M;)/U.nM, 
is a consistent estimate of the longevity L, of items of type u when the 
replacement rule has been in use for a long period of time and where n 
is large. 


5. THE NONPARAMETRIC NATURE OF THE ASYMPTOTIC RESULTS 


In the preceding sections it was assumed that the probability that 
an item of type 7, which had not failed by the tth inspection, will fail 
by inspection t+1, was a constant 1/L;. In this section we shall show 
that even if this assumption is not true the asymptotic results which 
we have obtained in the preceding sections are still valid; that is, the 
asymptotic results do not depend on the assumption of a constant risk. 

Let us assume only that the equipment has a finite life span, and let 
a,; be the probability that an item of type 7 will serve for exactly z in- 
spections, > 291 @z2:=1. The longevity of type 7 items is 


(37) > Taz = > bi _ Li, 
z=1 z=l 


where b.i= > jaz 2, is the probability that an item of type 7 will serve 
for at least x inspections. The quantity a.:/bz:=czi is the conditional 
probability that an item of type 7 which has not failed on the first z—1 
inspections will fail on the rth inspection. We wish to show that the 
asymptotic results do not depend directly on the values of the a,; (or 
b.i, OF Cri) but only on the values of the L;. For x sufficiently large, the 
probability that the descendant of an item in the initial population will 
be of type u and will have served for exactly ¢ inspections is 


(38) Pr.(u, t) = Przs(u, t — 1)(1 — Cen) 











522 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
for t=1, and 

k 
(39) Pr.(u, 0) = DY Do Prea(i, j—lespin, 

i=] jel 


where Piz is the probability that a replacement for an item of type 7 will 


be of type u. 
Again we shall deal with the case where the pi, are such that Pr,(u, 0) 
has a nonzero limit for u=1, 2, - - - , k, when z becomes large (see [3], 


pp. 325 and 275). Then 


(40) Pr..(u, t) = Pr(u, t) = Pr(u, 0) 2X (1 — cju) 


- Pr(u, 0) De+1,u, 


and 
Pr(u, 0) = > Le Pr(i, O)bs¢ePin 
(41) - > Pr(i, 0)piu Dy ays 


t=] j=l 
k 

= >> Pr(i, 0)piu. 
i=] 


The probability that a descendant of an item in the initial population 
will be of type u approaches 


(42) Pr(u) = >> Pr(u, t) = Pr(u, 0) >> bisi.u = Pr(u, 0)Lu. 

t=0 t=O 
Hence, the proportion M,=Pr(u, 0)/>2, Pr(i, 0) of the total re- 
placements which are of type u is determined by the system of equations 


k k 
(43) M. = > Moin, > M; = 1, 


i=l t=1 
and the probability that an item drawn at random from the population 
will be of type u approaches M,L./>-t., MiL;. Also, Dite1 Duteo 
Pr(u, t) >-%_., Pr(u, 0)L.=1. Therefore, the probability that an item 
of type u will be needed as a replacement is 


(44) Pr(u, 0) = M./ > M iL. 


i=l 








LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 523 


From these facts we see that when a given replacement rule is used for 
a long period of time, the results which we had obtained in the preced- 
ing sections remain valid (for equipment with a finite life span) even 
when the risk is not constant. 


APPENDIX 
Al. The Difference Equation 


We shall prove that the quantity Pr,(1| p) given by equation (2) 
represents the probability that an item will be of type 1 if it is drawn 
at random from the 2N items of which the population is composed at 
the zth inspection. Consider the history of one of the items of type 1 in 
the initial population. It may fail on, say, the 6th inspection and then 
might be replaced by an item of the opposite type, which will happen 
with probability p. The replacement might then fail on the 3rd inspec- 
tion and then its replacement might be of the opposite type (with prob- 
ability p). This history might be described by the sequence of num- 
bers 


L8B3464.6 LESS k++ 
or, more generally, 


Uo, U1, Us, Us, Us, Us, Us, U7, Us, Ug, * °°, 


where u,=7 when the “descendant” of an item in the initial population 
is of type 7 at the zth inspection. If a member of the initial population 
is drawn at random, the probability is 1/2 that it will be of type 1 
since the case where the population was initially composed of N items 
of each type of equipment is being considered; that is, 

Pr(wo=1| p) =Pro(1| p) =1/2. We see that ui=1 if either 


(a) uw =1 and the item did not fail on the first inspection, 

(b) uw =1 and the item failed on the first inspection but its replace- 
ment was of the same type, or 

(c) uw =2 and the item failed on the first inspection and its replace- 
ment was of the opposite type. 


The probabilities of (a), (b), and (c) are 


(a) Pro(1| p)s, 
(b) Pro(1| p)fi(1—p), 
(c) Pro(2| p)fep, 


respectively. 











524 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Hence the probability that u;=1 may be computed by summing these 
three probabilities. That is 


Pr(u: = 1| p) = Pri(1| p) 
(45) = Pro(1| p) [si + fa(l — p)] + Pro(2| p)fep 
= Pro(1| p)(1 — fip — fop) + Sop, 
since Pro(2| p)=1—Pr(1 | p). Similarly 
Pr(1| p) = Pri(1| p)(1 — frp — fp) + Sop, 


(46) 

Prs(1| p) = Pro(1| p)(1 — frp — fap) + Sep, 
and 
(47) Pr.(1| p) = Prsa(1| p)(1 — fip — fap) + Sop. 


Since it is here assumed that Pro(1 | p)=1/2, 


Pri(i| p) = [1 — plfi + fe) ]/2 + hop, 
Pr,(1| p) = [1 — p(fi + fe) }?/2 + fop[l — v(fi + fr)] + fap, 





(48) 
Pr3(1| p) = [1 — p(fi + fo) P/2 + fep[l — pif + fr) ]? 
+ fop[1 — p(fi + fo)] + hop, 
and - 
Pr,(1| p) = [1 — p(fi + fr) }*/2 + fp 2 [1 — p(fi + fo) 
1—[1—p(fitse)] 
= |{1l- 1 2) |7/2 2 
[1 — pif +f) / + fo eae = 
(49) = [1 — p(fit fe) ]*/2 


+ ffl — [1 — pith) }/h +h) 


as fe [ee - : 
“reat ls eal rh + f)] 


= [,/(L, + Lz) + — ~|3 








LoL “lh — phi + fr) |’. 





The quantity Pr.(1 | p) given by equation (49) represents the prob- 
ability that, if an item is drawn at random from the 2N items of which 
the population is initially composed, its descendant will be of type 1 
at the rth inspection. Furthermore, if an item is drawn at random from 
the 2N items of which the population is composed at the zth inspection, 
the probability that it is of type 1 is also Pr,(1| p). 











LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 525 


A2. The Mean Square Error 


We shall prove that among all replacement policies of the type in- 
vestigated in Section 2.1. the switch policy leads to estimates which 
have the smallest mean square error. The mean square error is (see 
Section 2.1.3.) 


(50) Pr,(1| p)Pr.(2| p)/2n + B*(p). 


In Section 2.1.3. it is seen that the absolute bias | B(p)| is a mono- 
tonically decreasing function of p. Hence, B?(p) is minimized when the 
switch policy is adopted (i.e., p=1). It is therefore sufficient to prove 
that Pr.(1| p) Pr(2| p) is a monotonically decreasing function of p. 
Since 


| 4 — Pr.(1| p)| =| 4 — Pr.(2| p)| 


(51) 2 (Fee ~ [1 -pih+h)F} 





is a monotonically increasing function of p, the distance between 
Pr.(1| p) (or Pr,(2| p)) and 1/2 is maximized when p=1. Hence the 
function Pr.(1| p) [1—Pr.(1| p)] = Pr,(1| p)Pr(2| p) is minimized when 
p=1. Therefore, the mean square error is also minimized when p=1. 


A3. A Theorem on Binomial Variates 


We shall prove that the quantity P given by equation (7) is a de- 
creasing function of Pr,(1| p); that is, P(g)= >-2.5 C." (1—g)? g"? 
is an increasing function of g when 0<g<1 and d<n. This follows from 
the fact that the derivative of P(g) with respect to g, 


oP(g) 9a € 
(52) —— =—})C.(1 — g)*9""*, 
og Og m0 
is equal to 
(53) nCa"""(1 — g)*9""1-4, 


which may be proved by mathematical induction on d. Since the deriv- 
ative is positive, P(g) is an increasing function of g. 

This result is of interest in itself when worded as follows: Let X be 
the number of successes in n Bernouilli trials where the probability of 
success for a trial is p:, and let Y be the number of successes when the 
probability of success for a trial is pz. Then X is stochastically larger 
than Y if and only if p; is larger than p:. This fact follows from the 





526 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


preceding result since we have seen that the probability that X will be 
less than or equal to d is an increasing function of 1— py, and, hence, a 
decreasing function of py. 


A4. An Impossibility Theorem 


We shall prove that it is impossible to devise a “simple” replacement 
policy which would lead asymptotically to exclusive usage of the type 
with the greater longevity. A replacement policy is defined as “simple” 
if the policy determines the number of replacements of each type using 
at most the following two facts: (1) the population composition after 
the last inspection, and (2) the number of failures of each type since 
the last inspection. In other words, a “simple” replacement policy is 
an integer valued random variable R(m, m2, c) whose distribution de- 
pends on the number m;, of failures of type 1 equipment since the last 
inspection, the number m; of failures of type 2 equipment since the last 
inspection, and the number c of type 1 pieces of equipment in the popu- 
lation after the last inspection. The random variable R(m,, mz, c) de- 
notes the number of type 1 pieces used as replacements. For example, 
for the 50-50 policy R(m, me, c) is a binomial variate with parameters 
m,+mz, and 1/2. For the switch policy R(m, me, c) = mz (that is, it is a 
binomial variate with parameters mz and 1). All replacement policies 
of the symmetric type investigated in Section 2.1, are “simple” policies. 
If the chance is p that the replacement for an item which failed will be 
of the opposite type, R(m, me, c) is the sum of two binomial variates, 
the first variate having the parameters m, and p, the second variate 
having the parameters m, and (1— 7). Since the distribution of R (m,, 
m2, c) for the symmetric type of replacement policies depends only on 
m, and m2, we could devise replacement policies which were “simple” 
but were not of the symmetric type. Of course, replacement policies 
which depend on c would not be simpler than rules which do not de- 
pend on knowing the total population composition after the last in- 
spection. 

Since the distribution of m, depends only on c and L, and the dis- 
tribution of mz depends only on 2N—c and Lz, the expected number 
of type 1 items in the population after replacement is a function 
G(c, L1, Le) of the composition c after the preceding inspection, and 
the longevities ZL; and LZ». Writing Pr.(c) to represent the probability 
that the population will contain c items of type 1 at the zth inspection, 
we have that the expected number of type 1 items in the population at 
the following inspection will be E.4:(ct)= >-2%5 Gc, Li, Ls)Pr.(c). 
If the replacement policy leads asymptotically to exclusive usage of the 
type with the greater longevity, 





LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 
(54) lim Pr.(2N) = 1 and lim E,4:(ct) = 2N 


ze zo 

when L,>JL:. Hence it is necessary that G(2N, Li, L2)=2N and 
G(0, Li, L2)=0, which means that once the population consists ex- 
clusively of one type of equipment then only that type will be used as 
replacements at future inspections. Therefore the probability is at least 
Pr,(2N) that type 2 equipment will no longer be used as replacements 
in any inspections following the zth inspection. Since the probability 
is at least Pr.(2N)+Pr.(0) that the decision to exclusively use one 
type of equipment will be made at the zth inspection or before, there 
is a finite probability that this decision will be incorrect. Hence there 
is no certainty the replacement policy will lead asymptotically to ex- 
clusive usage of the type with the greater longevity. 


A5. The Necessary Supply of Replacements 


Consider the history of a single item in the initial population. A re- 
placement of type 1 was used at the zth inspection if either (a) the 
descendant at inspection z—1 was of type 1 and failed by the zth in- 
spection and was replaced by the same type (chance is 1—p), or (b) 
the descendant at inspection x—1 was of type 2 and failed by the zth 
inspection and was replaced by the opposite type (chance is p). Hence 
the probability that a replacement of type 1 was used at the zth inspec- 
tion is 

Pr.(r = 1| p) = Prea(1| p)fa(l — p) + Prea(2| p)fep 
(55) = 1/(i + Lz) + { [fil — p) + fop)/2 


— 1/(L, + L2)} [1 — pif + fr) 


When the replacement policy is in use for a long period of time, about 
the same number of each type of equipment will be used as replace- 
ments; that is, 


(56) Pr.(r = 1| p) = Pro(r = 2| p) = 1/(i + 12). 


The probability that a replacement will be made at the zth inspec- 
tion is 


Pr.(r| p) = Pr.(r = 1| p) + Pra(r = 2| p) 
= 2/(Li + Lz) 
+ [(fi + fr)/2 — 2/(Li + sane — pf + fr) |? 
= 2/(Li + Lz) 
+ S[L — afi + fe) ]2-*(La + Zn)?/ZiL2(Li — Lx). 











528 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Since the second term on the left side of the equality is positive, the 
probability that a replacement will be made is a decreasing function of 
p and assumes its minimum when p=1. That is, when the switch policy 
is adopted fewer replacements will be needed on the average. 


A6. Comparison of the Switch and 50-50 Policies 


We shall prove that to reduce the bias of the estimates of longevities 
by any specified amount takes at least twice as many inspections un- 
der the 50-50 policy as under the switch policy. From equation (4), we 
see that the bias of the estimate n;/2n when the 50-50 policy is adopted 
for y inspections will be greater than the bias when switch policy is 
used for z inspections, as long as 


(58) 1 —-itf)/2 > [1 - Ath). 

When y $2z, assuming f:+f2>0, we have that 

[1 — (fi + f2)/2)" = [1 — (fi + fr)/2]* 
=[l-Ghth)+(h+f)7/4) > lL - hth}. 


Hence, when y $2z, the bias using the 50-50 policy is greater than when 
the switch policy is used. 


(59) 


A7. An Advantage to Replacement at Each Inspection 


Consider a modified form of the symmetric type of replacement poli- 
cies defined in 2.1.1. where replacements are made if necessary only on 
inspections z, 2z, 3z, - - -. We shall prove the statement made in Sec- 
tion 2.1.11. that, if the replacement policies are used for a long period 
of time, the bias and mean square error of the estimates of relative 
longevities will be minimized and the power of the test concerning rela- 
tive longevities will be maximized when items which have failed are 
replaced at the next inspection; that is, when z=1. 

Let f; be the probability that an item of type 7 will fail in the time 
interval between successive inspections if it did not fail at the earlier 
inspection. Then the chance that the item will fail in the time interval 
between inspections jz and (j+1)z is fitsifits7fit --- +877; 
=1—s8/=/f,(z). If the replacement policy is used for a long period of 
time, the number of items of type 1 in a random sample of 2n items 
from the population composition at the zth inspection (z is large) will 
have a binomial distribution with parameters 2n and g;(z) =f2(z)/ 
[f:(z)+So(z)] approximately. When z=1, gi(1)=f2(1)/[fi(1)+f2(1)] 
=L,/(L1+ZL). In order to prove that the bias and mean square error 
of the estimate n;/2n is minimized when z=1, it is sufficient to prove 











les 
1In- 
we 
ed 


18 


len 


li- 


ec- 
iod 
ive 
la- 
are 


me 
ler 
val 
“If; 

of 
ms 
vill 
z)/ 
1)] 
ror 
ve 








LIFE OF EQUIPMENT UNDER OPERATIONAL CONDITIONS 529 


that g:(z) is a monotonically decreasing function of z when L;>Lz2. The 
function g:(z) is monotonically decreasing if gi(z)/[1—g:(z)] is mono- 
tonically decreasing. Hence, it is sufficient to show that g:(z)/[1—g.(z) 
>gi(2+1)/[1—gi(z+1)] when L:>Lz; that is, 


(60) [1 — se*]/[1 — si¢] > [1 — see**]/[1 — a+] 
or 
(61) [1 — sit]/[1 — sett] < [1 — se*]/[1 — st]. 


This last inequality holds if [1 —s*]/[1—s*+*] =h(s) is a decreasing func- 
tion of s for 0<s<1. The derivative of h(s) is 
(62) se [— 2+ (2+ 1s — s*t]/[1 — st]? 


which is negative for 0<s<1. 


A8. The Differential Equation 


Let us assume that the chance is f,dt that an item of type 7 which had 
not failed by time ¢ will fail before time t+dt. Consider the type of re- 
placement policy where the chance is p that a replacement will be of 
the opposite type from the item which fails and where there is instan- 
taneous replacement upon failure. Consider the history of a single item 
in the initial population. Then the chance that its descendant will be of 
type 1 at time t+dt is 


Presat(1| p) = Pri(1| p)[fidt(h — p) + 1 — fidt] 
(63) + Pri(2| p)(fepdt) 
= Pr,(1| p)[1 — (fp + fop)dt] + fepdt. 


This equation is analogous to the difference equation (47). Writing 
P(t) = Pr,(1| p), then 


(64) P(t + dt) — P(t) = [f2 — P()(fi + fe) pdt 
and 

dP (t) 
(65) — P(t)(f: + fe) |p. 


Since it is here assumed that P(0) = 1/2, the solution of this differential 
equation is 


(66) P(t) = Li/(In + Ls) + — [7] tine, 
2L4,+ L 














530 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


This solution is analogous to the solution (equation 49) of the differ- 
ence equation (47) upon which most of the results in the preceding sec- 
tions were based. 


REFERENCES 


[1] Brown, George W., and Flood, Merrill, “Tumbler mortality,” Journal of the 
American Statistical Association, 42 (1947), 562-74. 

[2] Cramér, Harald, Mathematical Methods of Statistics, Princeton: Princeton 
University Press, 1946. 

[3] Feller, William, An Introduction to Probability Theory and Its Applications, 
Vol. 1. New York: John Wiley and Sons, Inc., 1950. 

[4] Glathart, J. L., and Preston, F. W., “Theory of the behavior of glassware in 
service,” Journal of the American Ceramic Society, 31 (1948), 153-70. 

[5] Statistical Research Group, Columbia University, Sequential Analysis of 
Statistical Data: Applications. New York: Columbia University Press, 1945. 

[6] Wald, Abraham, Sequential Analysis: New York, John Wiley and Sons, Inc., 
1947. 











r= 
C= 








ON SOME PROCEDURES FOR THE REJECTION OF 
SUSPECTED DATA 


E. P. Kine 
National Bureau of Standards 


ost of the statistical tests that have been recommended for the 
detection of a single outlier involve the difference between the 
largest (or smallest) observation and some measure of the location of 
the remaining members of the sample. In many situations, however, 
there is no a priori basis for anticipating which extreme will be under 
suspicion, with the result that this decision is based entirely on sam- 
ple evidence. The test statistic in such cases actually employs the “more 
deviant” extreme, and this reordering is seldom taken into account. In 
this note we shall show that, for two statistics commonly used to de- 
tect the presence of a single outlier, the effect of this “two-sided” hy- 
pothesis is approximately, but not exactly, to double the significance 
level of the standard test procedure. 
In cases where an accurate estimate of the population standard de- 
viation, ¢, is available, let us consider the statistic 


In — £ Z— 2% 
Un = or Wy = 








C o 


where x, and 2; are the largest and smallest observations, respectively, 
in a sample of n, and # is the sample mean. Under the null hypothesis 
of random sampling from a normal population, the distribution of wu, 
(or uw, since the two are identically distributed) has been tabulated by 
Grubbs [2]. Let 


u = max (UW, Un) 


under the same null hypothesis, and let G,(t) denote the distribution 
function of u in samples of n. We have at once 


G,(t) = P(u <t, un < #). (1) 


In case the sample size, n, is large it follows from the asymptotic inde- 
pendence of u, and u, that 


G,(t) = F,*(t) 


approximately, where F(t) denotes the common distribution function 
of u, and 1%. This is equivalent to 


531 





532 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
1 —G,() = [1 — F,(t)][2 — {1 — F.(é)}]. 


Thus the 100a per cent point of u, (or u:) is the 100a (2—a) per cent 
point of u. For practical purposes, where a<.10, the 100a per cent 
point of u, is the same as the 200a per cent point of u, which means that 
the critical values of u, can be used provided that the significance level 
is doubled. 

For small samples, where n =3 and n =4, the distribution function of 
u is available in [3]. Using this function and the known percentage 
points of u,, the corresponding significance levels for the statistic u are 
obtained. The results are given in Table I. 

TABLE I 


SIGNIFICANCE LEVEL FOR “u” CORRESPONDING TO 
A GIVEN SIGNIFICANCE LEVEL FOR “up” 











P{u2un(a)} 
a 
n -01 .05 .10 
3 .018 .084 .159 
4 .020 .089 .170 
co) .020 .097 .190 

















In cases where o is unknown, Dixon’s statistic [1] of the form 
Za —~ Za—i t2— 1 
rt, = ————[or n = ——— 
Zn — TZ In =, ZT 


may be used where 211<21< ++ + <2%p-1<2, is an ordered random sam- 
ple of n from a normal population. If we let 


r = max (rn, rn) 
and denote the distribution function of r by S,(z) we have 
S,(z) = P(n < z, tn < 2) 
= P(r; < z) + P(r, < z) — P(r or rz < 2). 
Since P(r: <z or rna<z) $1, we obtain the inequality 
S,(z) 2 P(r < z) + P(r. < z) -— 1. 


Finally, letting R,(z) denote the common distribution function of 1 
and r, in samples of size n the above inequality becomes 


S,(z) =2R,(z2) -—1 for 0S2S1--- 





I a )hO OUT 


\e \e uve 





PROCEDURES FOR REJECTION OF SUSPECTED DATA 533 


which is equivalent to 
2[1 — R,(z)]}2>1-S,.(z), OS281. (2) 


Hence the 100a per cent of r, is, at most, the 200a per cent point of r, 
regardless of sample size. It is easily verified that (2) becomes an equal- 
ity for }S$z<1 and a strict inequality for O0<z<}. 


REFERENCES 


{1] Dixon, W. J., “Ratios involving extreme values,” Annals of Mathematical 
Statistics, 22 (1951), 68-78. 

{2} Grubbs, F. E., “Sample criteria for testing outlying observations,” Annals of 
Mathematical Statistics, 21 (1950), 27-58. 

[3] King, E. P., “The operating characteristic of the control chart for sample 
means,” Annals of Mathematical Statistics, 23 (1952), 384-95. 











ON THE USE OF RANGES, CROSS-RANGES AND 
EXTREMES IN COMPARING SMALL SAMPLES 


HaNnNES HYRENIUS 
University of Gothenburg 


1. INTRODUCTION 


I ORDER to simplify and reduce the arithmetical work involved in 
analyses of statistical quality control, attempts have been made to 
derive useful substitutes for the ordinary methods of comparing two 
samples. Among these, a mention may be made of the quotient of two 
ranges or two mean ranges as alternatives to the variance ratio in test- 
ing for homogeneity in variation. In a recent article [3] Paul R. Rider 
presents the distribution of the quotient of two sample ranges, the uni- 
verse being defined by the rectangular distribution. 

The purpose of the present article is to describe a procedure by which 
it is possible to test not only differences in variation but also differences 
in location. The method is based on the assumption of a rectangular 
universe. 

The definitions and the derivation of the sampling distributions are 
given in Sections 2-6. Section 7 shows the relation of the new test of 
variation to that studied by Rider. Section 8 gives a discussion of the 
proposed test statistics, indicating their usefulness and limitation. Sec- 
tion 9 explains the test tables given as an appendix. Finally, some nu- 
merical examples are presented in Section 10. 


2. DEFINITIONS 


The universe to be sampled is the rectangular distribution 
1 
(1) p(x)dz = 3™ 0szcB. 


We adopt the following notations: 
Sample 1: N; items; lower extreme =~; upper extreme = 1 
Sample 2: N2 items; lower extreme =u; upper extreme = v2 


We choose as our sample 1 the sample having the smaller lower ex- 
treme, thus u; Sue. If u: =e, sample 1 may be taken as the sample first 
obtained. 

The ranges are 
(2a) Ru =a — 1, 


534 








J oe J J ee ed 


ve ~~ = 


1 @& 


t- 
st 








USE OF RANGES, CROSS-RANGES AND EXTREMES 535 
(2b) Ro» = ve — Ue. 
We define the cross-range as 
(3) Ra =e — UU, 
and the lower-extreme-difference as 


(4) Sa = U— YW. 


The analysis is to be performed by means of the following test-quo- 
tients 


Sor Uz—- UW 








5a T= = ’ 
( ) Ru wu. UW 
R Ve — Ue 
(5b) U = — = — , 
11 1— UW 
R - 
(5c) ee em. A 


The three quotients are related by the equation 
(6) T+U=V. 


3. THE DISTRIBUTION OF T 


The distribution of 7 can be derived in the following way. 
For a general universe p(x), the distribution of the lower extreme, u, 
in a sample of N items is given by 


G fw) = New] f" rae]. 


If, specifically, p(x) is the rectangular distribution given by (1), this 
formula reduces to 


(7)' f(u) = bd (B — u)’- 
- 2 1 
For the upper extreme, v, we have the general expression 
v N-1 
(8) fo) =| f" piayaz] vt) 


which for the rectangular case takes the form 








536 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
(8)’ fo) = ov 
By 

Holding the lower extreme of sample 1, w, fixed, we find that the 


distribution of v; is the distribution of an upper extreme in a sample 
of N,—1 items, lying within the range u S23 B. We hence obtain 








Ni-1 
Jao) = (B — m)¥ (v1 — ws), 
For uz (=u) we find 
N2 
fal aoe ee Osa 


The joint distribution of v; and uz for a fixed value u; hence is given 
by the product of the last two distributions 


fu,(n, Ue) = Sus(r1) ‘Sus(U2). 


Writing uw=4.+T7(v1—u), and integrating over 1, we obtain 





(9a) (P) = (0) = (Ni = DNS (= a 
. Jus - 7 , “= r \=— 


0<TS1, 


corresponding to the case u2<1. 

The distribution is independent of u and obviously gives the gen- 
eral distribution of the quotient T for all values of u; from 0 to B. 

For u2>1 we obtain 





— - Ns-1 -_ Nz- 1 1 
(9b) f(T) = (NM, IN 2 ( »( ; a 


1sTso~, 
It is easily seen that the total area for all values of the quotient 7 
adds up to unity. 
4. THE DISTRIBUTION OF U 


The distribution of the range quotient U is derived from the joint 
distribution of v;, w2, and v2 for a fixed value 1 


Ni —1 N2(N2 gic) 1) 


fu,(r, U2, V2) = (vu. — lite peo (ve — Up) %#-2, 











1e 
le 


en 


int 








USE OF RANGES, CROSS-RANGES AND EXTREMES 537 


Introducing v2—u,= U(vi—m), and integrating for wu, and 1, we ob- 
tain the distribution of U as 


(U) =f(U) =$—_ a? fat) 
(10a) tidied (N4+-Me-1)04+-2 
—(Nit+N2—-2)U%="], OSUS1. 


The distribution is independent of u and is thus valid for the whole 
span 0—B. for U>1 we obtain 


(Ni — 1)N2(N2 — 1) 1 
’ 1 s U s ©, 
(Ni + Nz — 1)(Ni + No — 2) U™ 





(10b) f(U) = 





5. THE DISTRIBUTION OF V 


The distribution of the cross-range quotient V is derived analogously 
from the joint distribution of v; and v for a fixed value 1: 











foiltn, 0) arene (gy = yi OO fy 
u,\Yl, 42 (B a u)¥I-1 1 1 (B _ uy)? 2 1 . 
Writing v2—u, = V(vi—u) and integrating, we obtain 
(11a) fu(V) =f(V) = Oi = DNs ony: 0<V<1 
ad | Ni + N2 fel 1 ’ => => ? 
(Ni-—1)N2 1 
(11b) S(V) = , 1sVSo, 


Ni+N2.-1 y™ - 


6. MEANS AND VARIANCES OF 7’, U, AND V 


A few remarks should be made about the sampling distributions of 
the three quotients. From the type of population distribution it is im- 
mediately clear that, with increasing sample sizes, the quotient 7’ with 
probability 1 tends to 0 while U and V in the same way tend to 1. The 
distributions reduce in the limits to the unitary distribution for T =0, 
U=1 and V=1. 

As for the means we find the general expressions 





Ni-1 1 
(12a) ey ee ; 
Ni-2 Net+1 
’ ‘ Ni —_! 1 Nez —_ 1 
(12b) wi/(U) = 





Ni—-2 No+1- 











538 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
Ni-1 N2 


12 "(V) = : 
(12c¢) u'(V) a... 3 ai 





The variances are 
(Ni — 1)[(Mi — 2)2N2 + (N2 + 2)] 
(Ni — 3)(Ni — 2)2(N2 + 1)%(N2 + 2) 
(Ni — 1)(N2 — 1)[2(Ni — 2)? + N2(N2 + 1) — 2] 
(Ni — 3)(Ni — 2)2(N2 + 1)2(N2 + 2) 
(Ni — 1)N2[(Ni — 2)? + (N2 + 1)? — 1] 
Ny — 3)(Ni — 2)°(N2 + 1)(N2+2) 
The formulas conform to the statement just given. 


The form of the distributions is illustrated in Diagram 1 for sample 
sizes N,= N2=4. 





(13a) (7) = 


(13b) 2(U) = 








(18c) m(V) = 


7. RELATION OF U TO RIDER’S u 


The procedure of selecting the sample with the smaller lower ex- 
treme as a kind of origin constitutes a difference from the procedure 
used by Rider [3] in testing the range-ratio u=R,/R2. 

It is easily verified that the probability of w,2 is Ni/(Ni+N2), 
while the probability of wzSu is N2/(Ni+N2). From this the relation 
between the distributions of U and wu is found in the following way: 

If we distinguish the two distributions (10 a) and (10 b) as fo:(U) and 
fu(U), we may derive their complementary functions by exchanging 
Ni and No, and at the same time changing U into 1/U. The two new dis- 
tributions may be denoted g:2(U) and Z2(U). 

Now, the two distributions of u=R,/R2, as derived by Rider, are 
obtained by a simple weighting: 


Ni No 
f(u) = ——— fn(w) + ————— gaol) 





NitN2 Nit+N2 
(14a) ™ Ni(Ni1—1)N2(N2—-1) (Nit Ns)? 
(Nit+-N2)(NitN2—1)(Ni+N2—2) 
—(Ni+N2—2)u"2], 0<uSl, 
NM, Ne 
omer ee gu(u)-+-————— Ni4N — fis(u) 
(mh) NM D)NANY—1 





Ni N2 -M1 
(Ni+Ne2)(Nit+tNe2—1)(NitN2—2) [( + ju 


—(Ni+N2—2)u-""], l1sus~o, 








re 


ire 


2), 
ion 


ing 
lis- 


are 








USE OF RANGES, CROSS-RANGES AND EXTREMES 539 


3 7 








Vv 
025 .025 95 975 975 


D1iaGcraM 1. The distribution of 7, U, V for Ni=N2=4. 


The shaded areas indicate: For 7: the upper 5 per cent tail. For U and V: the 
lower and the upper 2.5 per cent tails. 


A similar weighting could, of course, be made with the distributions 
of T and V. Because of the complication arising from the signs of the 
differences u;—u; and v;—u; (i, 7=1, 2) it is considered preferable to 
use the procedure adopted here, i.e. selecting the smaller of the two 
lower extremes as a starting point. 


8. DISCUSSION OF THE TESTS 


The three quotients 7, U, and V may be briefly characterized in the 
following way: 
T gives a test for possible differences in location 











540 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


U gives a test for possible differences in variation 

V gives a test for possible differences in location or in variation or in 
both. 

The use of a range-ratio in studying differences in variances may not 
call for any specific justification. As for the use of extreme values in 
comparing differences in location, however, there are obviously quite a 
number of possible characteristics with a higher or lower degree of 
efficiency. 

The general desirability of using a most efficient statistic is very 
often hindered by the mathematical complexity in deriving the sam- 
pling distributions. On the other hand, the inefficiency of a statistic 
may sometimes be balanced by the simplicity of its calculation and 
use, as is the case in several types of routine work such as, e.g., statisti- 
cal quality control. Although the extreme values usually are very in- 
efficient statistics, they are in some respects good when dealing with 
samples from rectangular universes. The power efficiency of the pro- 
posed statistics and some related statistics is going to be studied. 

In this connection the question arises how other population forms 
might affect the sampling distributions of TUV, and the relative ef- 
ficiency. It is, e.g., of interest to know the critical values in the case 
of a normal or nearly-normal universe. These questions are being stud- 
ied for a set of different frequency functions covering a variety of pop- 
ulation types. 

From preliminary results arising from these investigations it may 
be noted here that very skew population distributions give rise to con- 
siderable deviations in the sampling distributions and hence in the 
critical values. For symmetrical and moderately asymmetrical uni- 
verses, the critical values show fairly small deviations from those ob- 
tained for the rectangular case. 

Because V includes the effects of sampling variations in both T and 
U, it may obviously, under certain conditions, fail to reveal existing 
large deviations of the two addends, namely, whenever the deviations 
of T and U are in opposite directions. On the other hand, a significant 
value of V may be due to large although non-significant values of T 
and U. When using V, one must therefore interpret the results with 
more care than is necessary when applying the simple T- and U-tests. 

As already indicated the three tests presented here are primarily 
thought of as being useful in routine work in statistical quality con- 
trol. Under such circumstances it may sometimes be considered useful 
to apply the V-test as a first guide in “hunting for troubles.” Usually, 
however, it seems better to use the separate T- and U-tests. The tests 








153 


ry 


tic 
nd 
sti- 
in- 
ith 
ro- 


ef- 
ase 


ud- 


nay 
on- 
the 
ani- 


and 
ting 
ions 
cant 
of T 
with 
ests. 
arily 
con- 
seful 
ally, 
tests 


| 





USE OF RANGES, CROSS-RANGES AND EXTREMES 541 


might accordingly be referred to as the T U-tests or, if the V-test is also 
being considered, the 7'U V-tests. 


9. TABLES 


In Tables 1-3 are given percentage points of the distributions of 7, 
U, and V for sample sizes from 2 to 10. It is to be noted that one should 
use two-tail tests with regard to U and V, while T is to be used as a 
one-tail test. The critical values are consequently given by means of 
the 10, 5, and 1 percentage points for T, and by means of the 99.5, 
97.5, 95, 5, 2.5, and 0.5 percentage points for U and V. (See Diagram 1.) 


10, NUMERICAL EXAMPLES 

In order to illustrate the use of the T/UV-tests, a few examples are 
given. 

A. Applying the tests to the data used by Rider [3] we have 


Ni = 5, u = 72, v, = 80, 
N2 = 10, U2 = 75, ve = 79. 
From these values we obtain 
T = 0.375, U = 0.500, V = 0.875. 


It is found from the tables that 7 is significantly large at the .05 
level, while U is low at the .01 level. The last result is, of course, in 
agreement with that which was shown by Rider when using his w-test. 
The use of the 7’'U-tests thus reveals more differences than could be 
seen by the u-test alone. 

The fact that 7 and U deviate in opposite directions leads to the re- 
sult that V does not give any significant indication even on the .10 
level. 

B. In the factory control of manufacturing ball bearings at the 
S.K.F. in Gothenburg, the groove location was measured from a cer- 
tain specified norm. Samples of 4 being taken, the following data were 
obtained at two different occasions: 


Sample A 4, 3, 0, 1, 
Sample B 0, —1, 0, —2. 


Sample B being our sample 1, we obtain 
T = 1.0, U = 2.0, V = 3.0. 


It is found that there is a shift in location, significant at the 5 per 
































542 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
TABLE 1. UPPER PERCENTAGE POINTS OF T 
Given two samples from continuous rectangular populations, call the sam- 
ple with smaller minimum observation the first sample. 
Let N;=number of observations in the i-th sample 
ui =minimum observation in the i-th sample 
v; = maximum observation in the i-th sample 
Then 
2 = Us UW 7 
wn UW 
Value of T exceeded with Value of 7 exceeded with 
Ny, | Ns probability yi | Ns probability 
0.10 0.05 0.01 0.10 0.05 0.01 
2 2 3.33 6.67 33.3 6 2 0.85 0.99 1.37 
3 2.50 5.00 25.0 3 0.67 0.81 1.12 
4 2.00 4.00 20.0 4 0.55 0.67 0.95 
5 1.67 3.33 16.7 5 0.46 0.57 0.83 
6 1.43 2.86 14.3 6 0.40 0.50 0.74 
8 EE 2.22 1S 8 0.31 0.40 0.60 
10 0.91 1.82 9.09 10 0.26 0.33 0.51 
3 2 1.29 1.83 4.08 8 2 0.80 0.91 1.16 
3 1.00 1.41 3.16 3 0.62 0.75 0.97 
+ 0.82 5 ee 2.57 4 0.51 0.62 0.84 
5 0.69 0.98 2.18 5 0.43 0.53 0.74 
6 0.59 0.85 1.89 6 0.37 0.46 0.65 
8 0.47 0.66 1.49 8 0.29 0.37 0.53 
10 0.39 0.55 1.23 10 0.24 0.31 0.45 
+ 2 1.00 1.26 2.15 |} 10 2 0.77 0.88 1.07 
3 0.79 1.00 1.71 3 0.60 0.72 0.91 
4 0.65 0.83 1.42 4 0.49 0.60 0.79 
5 0.55 0.71 1.21 5 0.42 0.51 0.70 
6 0.46 0.62 1.06 6 0.36 0.45 0.62 
8 0.37 0.49 0.85 8 0.28 0.35 0.51 
10} 0.31 0.40 0.71 10 | 0.23 0.29 0.42 
5 2 0.90 1.07 1.61 
3 0.71 0.87 1.30 
4 0.58 0.72 1.09 
5 | 0.49 0.62 0.95 
6 0.42 0.54 0.83 
8 | 0.33 0.43 0.67 
10 | 0.27 0.36 0.58 









































a 























53 TABLE 2. UPPER AND LOWER PERCENTAGE POINTS OF U 
Using the same notation as for Table 1, 
™ oe ai v2 — Us m 
uA — UW 
- 2 Values of U exceeded with probability 
1 2 
0.995 0.975 0.95 0.05 0.025 0.5 
2 2 0.01 0.03 0.05 6.67 13.3 66.7 
3 0.07 0.17 0.24 10.0 20.0 100.0 
4 0.18 0.31 0.40 12.0 24.0 120.0 
_ Aa 0.28 0.43 0.53 13.3 26.7 133.0 
= 6 0.37 0.53 0.62 14.3 28.6 143.0 
8 0.50 0.65 0.74 15.6 31.1 156.0 
10 0.60 0.74 0.81 16.4 32.7 164.0 
— 3 2 0.00 0.02 0.04 1.82 2.58 5.77 
l 3 0.06 0.13 0.19 2.45 3.46 7.75 
= 4 0.15 0.26 0.33 2.83 4.00 8.94 
, 5 0.24 0.38 0.46 3.09 4.36 9.76 
‘ 6 0.33 0.47 0.55 3.27 4.63 10.4 
8 0.46 0.60 0.67 3.53 4.99 11.2 
5 10 0.56 0.68 0.75 3.69 5.22 11.7 
3 
‘ 4 2 0.00 0.02 0.03 1.26 1.59 2.71 
: 3 0.05 0.12 0.18 1.59 2.00 3.42 
4 0.14 0.25 0.31 1.79 2.25 3.85 
l 5 0.23 0.35 0.43 1.93 2.43 4.15 
6 0.31 0.44 0.51 2.03 2.55 4.37 
5 8 0.44 0.57 0.64 2.17 2.73 4.67 
’ 10 0.54 0.66 0.72 2.26 2.84 4.87 
4 5 2 0.00 0.02 0.03 1.07 1.28 1.91 
4 3 0.05 0.12 0.17 1.30 1.55 2.31 
5 4 0.13 0.23 0.30 1.44 1.71 2.56 
5 0.22 0.34 0.41 1.54 1.83 2.73 
3 6 0.30 0.42 0.50 1.61 1.91 2.86 
5 s 0.43 0.55 0.62 1.71 2.03 3.04 
10 0.52 0.64 0.70 1.77 2.11 3.15 
7 6 | 2 | 0.00 0.02 0.03 0.99 1.14 1.57 
1 3 0.05 0.11 0.16 1.17 1.34 1.85 
9 4 0.13 0.23 0.29 1.27 1.46 2.02 
0 5 0.22 0.33 0.40 1.35 1.55 2.16 
9 6 0.29 0.41 0.48 1.40 1.61 2.23 
8 0.42 0.54 0.60 1.48 1.70 2.35 
: 10 0.51 0.63 0.68 1.54 1.77 2.44 
s 2 0.00 0.01 0.03 0.92 1.02 1.28 
3 0.05 0.11 0.15 1.04 1.15 1.45 
4 0.13 0.22 0.28 1.12 1.23 1.55 
5 0.21 0.32 0.39 1.17 1.29 1.63 
6 0.28 0.40 0.47 1.21 1.34 1.68 
8 0.41 0.52 0.59 1.27 1.40 1.77 
10 0.50 0.61 0.67 1.31 1.45 1.82 
10 2 0.00 0.01 0.03 0.89 0.97 1.15 
3 0.05 0.10 0.15 0.99 1.07 1.28 
Saas 4 0.12 0.21 0.27 1.05 1.13 1.36 
5 0.20 0.31 0.38 1.09 1.18 1.41 
6 0.28 0.39 0.46 1.12 1.21 1.45 
S 0.40 0.52 0.58 1.17 1.26 1.51 
10 0.49 0.60 0.66 1.20 1.30 1.55 

















TABLE 3. UPPER PERCENTAGE POINTS OF V 


Using the same notation as for Table 1, 











> 


-995 





_ _ — — — 


SCHOAMIPWNH CBHATIRWNH COMOTIrWNH COMHOTRWNHN COMOOUTPWNH SWOAOUPWN COMM P WwW 


— 





oe 





w 
i) 











Values of V extended with probability 


RAWWOON 


NWNNNNNNH HP PWWWWW 


NNNNNNK SWONNNNN Aaah > 


Pt pet ee et pet et pt 
NNN NN dd WWwWWWWWH Qaaauk. » 


jt pet ee feet pet peed et 


—T—._ 7.7) — ss ._ 7.) 
‘nas 2s = & _a- s e Se 


eccoooo cocecoeoo ooostcoec cocececo cooceceo coceceo ococeco 
Se 


ss ee | —— 7 7s! 
ae a a a ee” a a ok er ae oe 


ft bt et fed pet fet et 
Pmt th et fet feet fat pt 





USE OF RANGES, CROSS-RANGES AND EXTREMES 545 


cent level, as well as a slight shift in variation, significant at the 10 per 
cent level. The cross-range test V is also significantly high. 

C. Data on production of metal knobs (W. B. Rice, Control Charts 
in Factory Management, Tables 4 and 5). Four polished metal knobs 
were measured, in 1/1000 inch, for two kinds of steel with different 
hardness. The following values were observed: 


Sample 26 742, 744, 742, 737 
Sample 31 749, 744, 749, 747 


By calculating < and s? we find by ordinary methods that the vari- 
ances do not differ significantly and that the averages differ at the 5 
per cent level (t=3.15 with t.o5=2.45). 

Using the more rapid T U-tests we have T = 1.00, U =0.71. The tables 
show that there is a difference in location at the 5 per cent level but no 
difference in variation. The findings are consequently in accordance 
with what was obtained in the more laborious way by the t- and F-tests. 


REFERENCES 


[1] Link, Richard F., “The sampling distribution of the ratio of two ranges from 
independent samples,” Annals of Mathematical Statistics, 21 (1950), 112-16. 

[2] Mosteller, Frederick, “On some useful ‘inefficient’ statistics,” Annals of 
Mathematical Statistics, 17 (1946), 377-408. 

[3] Rider, Paul R., “The distribution of the quotient of ranges in samples from 
a rectangular population,” Journal of the American Statistical Association, 46 
(1951), 375-78. 

[4] Walsh, John E., “On the range-midrange test and some tests with bounded 
significance levels”, Annals of Mathematical Statistics, 20 (1949), 257-67. 





THE DISTRIBUTION OF THE PRODUCT OF RANGES 
IN SAMPLES FROM A RECTANGULAR POPULATION 


Pau R. RIDER 
Washington University and Wright-Patterson Air Force Base 


INTRODUCTION AND SUMMARY 


N AN earlier paper [4] the distribution of the quotient of the ranges 
| of two independent, random samples from a continuous rectangular 
population was given and discussed. (See also [3].) In the present paper 
the distribution of the product of such ranges is derived. 

For the distribution of the product of the ranges of two independent 
samples a general formula is given. This formula fails to hold if the 
sample sizes are the same or if they differ by unity, and special con- 
sideration has to be given to these two cases. 

The distribution of the product of the ranges of k independent sam- 
ples of equal size is derived. 

Simple formulas for the moments about the origin are given for all 
of these distributions. 


PRODUCT OF TWO RANGES 
The population which we wish to consider is the rectangular popula- 
tion given by 


1 for OS2781, (1) 
Ie Lo elsewhere. 
For the purpose of keeping formulas simpler, a unit interval for the 
independent variable has been chosen. While the interval over which 
the population extends has no effect on the distribution of quotients of 
ranges, it will obviously have an effect on the distribution of products 
of ranges. However, there is no essential loss of generality in assuming 
that this interval is unity. 

It is well known that the distribution of ranges in samples of size n 
from the population (1) is 


n(n — 1)x"-2(1 — x)dz. (2) 
(See, for example, [2], page 192. In this reference the distribution is 
given in cumulative form, which must be differentiated to yield (2).) 


If zx; is the range of a sample of size m and zz the range of a sample of 
size n from (1), then if the samples are independent, the joint distribu- 


546 





DISTRIBUTION OF THE PRODUCT OF RANGES 547 
tion of 2; and 2; is 

m(m — 1)n(n — 1)ay™"2x2"-2(1 — 21)(1 — 22)dxidze. (3) 
We wish to determine the distribution of the product u=222. 


To do this we first replace, in (3), 22 by 1:~!u and dzz by 2:~du. This 
yields 


m(m — 1)n(n — 1)u?a,"-"""(1 — 21)(1 — 2y7'u)daidu. (4) 
To get the distribution of u we integrate (4) with respect to 2, from u 
to 1, since these values are the minimum and the maximum, respec- 


tively, that x, can attain. After some simplification, the distribution of 
u is found to be 


m(m — 1)n(n — 1) 





j Elm = m+ 1) = (om =n = Dudu 


(m — n)[(m — n)? — 1 
+ [(m —n—1) — (m—n+ Lulu} du, (5) 


in which m—n is different from 0 and +1. 

Since, if the samples are of unequal size, it is immaterial which is 
larger, we shall assume m2n and consider the cases m—n=0 and 
m—n=1, that is, the case in which the samples are of equal size and 
the case in which one sample contains one more item than the other. 

For the case m—n=0, we find, by using the same method, that the 
distribution of u is 


— n*(n — 1)2u-?[2(1 — u) + (1 + u) log udu. (6) 
Similarly, for the case m—n=1, we get 


n?(n? — 1)u"-2(4 — 4u? + u log u)du. (7) 


PRODUCT OF k RANGES 


We shall now consider the product of k ranges, limiting the discus- 
sion to the case of equal sample sizes. 

To derive the distribution of the product of three ranges in inde- 
pendent samples from the population (1) we replace the variable u in 
(6) by y and multiply the result by (2). This gives 


— n*(n — 1)8x"-*(1 — z)y"?[2(1 — y) + (1 + y) log y|dzdy. (8) 


Since y is the product of the ranges of two independent samples and z 
is the range of a sample which is independent of either of the samples 
involved in the product y, (8) gives the joint distribution of z and y. 

If we let u=zy, then u is the product of the ranges of three inde- 











548 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


pendent samples from (1). We replace, in (8), x by y~'u and dz by 
y— du. The distribution of u can now be found by integrating with re- 
spect to y from y=u to y=1. 

Repeating this process again and again, we find that the distribution 
of the product u of k ranges of independent samples of size n from the 
rectangular population (1) is given by 


(—1)*"'n¥(n — 1)tu"-?[ao(1 — u) + ar(1 + u) log u + a2(1 — u) log? u 
+ +++ + apaifl + (—1)'u} log*uldu, (9) 





in which 
aQy-1 = — 
(k — 1)! 
k 
Qr-2 
(k — 2)! 
: (k+r—2)! 
ok —-Dir- Dk! 
(2k — 2)! 
a = [&—D!? . (10) 


The result can be proved by mathematical induction. 
The distribution for k =2 is given by (6). The distributions for k=3, 
4, 5, 6 follow. 


n3(n — 1)*u"-?[6(1 — u) + 3(1 + u) logu + 4(1 — u) log? uldu, 
— n4(n — 1)4u"—?[20(1 — u) + 10(1 + u) log u + 2(1 — u) log? u 
— 3(1 + u) log? uldu, 
n5(n — 1)5u*-2[70(1 — u) + 35(1 + wu) log u + 48(1 — u) log? u 
+ $(1 + u) log? u + (1 — u) log uldu, 
— n°(n — 1)8u"-?[252(1 — u) + 126(1 + u) log u + 28(1 — u) log? u 
+ 4(1 + u) log?u + (1 — u) logtu + zho(1 + u) log’ udu. (11) 


MOMENTS 


For the distribution of the product of the ranges of two independent 
samples of size m and n respectively, with m—n>1, the moment of or- 
der j about the origin is 





nt 
or- 








DISTRIBUTION OF THE PRODUCT OF RANGES 549 
m(m — 1)n(n — 1) 
h;’ = . . ‘ . (12) 
(m+ j)(m+j—1(n+ (n+ 9 — 1) 
If m—n=1, then 
n?(n? — 1) 
Bj’ (13) 





~ n+ 9)?[(n +9)? — 1] 


For the distribution of the product of k independent samples of n 
items each, the jth moment about the origin is 


_ n*(n — 1)* 
(n+ f(n +9 — 1) 


The formulas for the central moments are not simple and conse- 
quently are of no particular interest. 


, 


Bj 





(14) 


REFERENCES 


[1] Aroian, Leo A., “The probability function of the product of two normally dis- 
tributed variables,” Annals of Mathematical Statistics, 18 (1947), 265-71. 

[2] Kenney, J. F., and Keeping, E. 8., Mathematics of Statistics, Part II, 2nd 
edition. New York: D. Van Nostrand Company, Inc., 1951. 

[3] Link, Richard F., “Correction to ‘The sampling distribution of the ratio of 
two ranges from independent samples’,” Annals of Mathematical Statistics, 23 
(1952), 298-99. 

[4] Rider, Paul R., “The distribution of the quotient of ranges in samples from 
a rectangular population,” Journal of the American Statistical Association, 46 
(1951), 502-7. 

[5] Shellard, G. D., “Estimating the product of several random variables,” Jour- 
nal of the American Statistical Association, 47 (1952), 216-21. 














CONFIDENCE AND TOLERANCE INTERVALS FOR 
THE NORMAL DISTRIBUTION* 


FRANK PROSCHAN 
National Bureau of Standardst 


Confidence and tolerance intervals for the normal distribu- 
tion are presented for the various cases of known and un- 
known mean and standard deviation. Practical illustration 
and interpretation of these intervals are given. Tables are 
presented permitting a comparison among the intervals. Fi- 
nally the relationship between the two types of intervals is 
described. 


1. Introduction. Discussions of the theory of errors will sometimes 
state that the mean plus or minus the probable error will include 50% 
of future observations (assumed normally distributed). This, of course, 
is true only if the mean and the probable error of the population itself 
are used. Unfortunately, in most practical problems one or both of 
these may not be known. Experimenters who use the sample mean plus 
or minus the sample probable error with the expectation that this 
interval will contain 50% of future observations may be seriously delud- 
ing themselves. 

However it is possible to construct intervals of the type #+ks 
(¢=sample mean, s=sample standard deviation) which will, on the 
average, include 50% of the population. From this one is led to a more 
general consideration of such intervals, and to the uses to which they 
can be put. 

All populations discussed in this paper are normal unless otherwise 
specified. Let », o refer to the population mean and standard deviation 
respectively. 

Any one of four possible situations may exist: (a) un, ¢ both known; 
(b) w unknown, ¢ known; (c) » known, ¢ unknown; (d) pz, ¢ both un- 
known. 

Let m represent either » or Z; let s. d. represent either o or s. Then two 
important types of assertions may be made about intervals of the form 


mt+ks.d. (1) 
A. Confidence interval. The probability is y that the interval (1) 


contains the population mean (or, alternatively, the second sample 
mean). 





* Presented at the annual meeting of the American Statistical Association, Boston, December 1951. 
t Now at Syracuse Electric Products, Inc., Hicksville, N. Y. 


550 





CONFIDENCE AND TOLERANCE INTERVALS 551 


B. Tolerance interval. In a large series of repeated samples the pro- 
portion of the population contained in (1) is 

(B1) a, on the average 

(B2) P or more, y of the time. 

In this paper, a comparison is made among the values of k appro- 
priate to the respective cases obtained from various combinations of 
A and B with (a), (b), (c), and (d). Practical illustrations and inter- 
pretations are given of these cases. 

In addition, details are given of a proof of a result by Wilks (1941) 
for the case B1. These details are given because they are suggestive of 
a general method applicable in such problems. Also, tables are presented 
of values of k for a certain class of confidence and tolerance intervals. 

Finally, the relationship between confidence intervals and tolerance 
intervals is discussed. 

2. Definition of symbols. For convenience, the definitions of symbols 
are brought together in this section. 

#=population mean 

o=population standard deviation 


n 
aE? 
- t=] ° 
g= mean of a sample of n observations, 
n 





> (z;—2)* 


i=l 





» sample standard deviation 
i= 


m=p or £ 


sd.=¢o or 8 


p=proportion of the population contained in m+k s.d. where k 
= constant 
Given a normal distribution with ».=0, c=1. Then 
L,=normal curve deviate which is exceeded in absolute value 
with probability a 
te.n-1=Student-t value for n—1 degrees of freedom which is ex- 
ceeded in absolute value with probability a. 
x7a,n-1= Chi-square value with n—1 degrees of freedom which is ex- 
ceeded with probability a. 
3. Confidence intervals. A chemist makes n determinations of the iron 
content, », of a solution. What interval shall he select so that he can 





552 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


assert with 50% confidence that u lies within that interval? The dis- 
tribution of observations is normal with mean uz. 

3.1 For the population mean, standard deviation known. First consider 
the case where the chemist knows c. (The determination is of a routine 
type, for which a great many sets of previous observations are avail- 
able, from which ¢ is calculated.) In this case 

.6745 
s+ —@¢ 
Vn 
will contain the “true” value (population mean) 50% of the time. 
This may be seen from the following diagram: 

















AL 


Lay off AB: ut (.6745/+/n)o, and CD: + (.6745/+/n)o. Notice that 
when # lies in AB, » must of necessity lie in CD; and when & does not 
lie in AB, » must lie outside of CD. But since Z is normally distributed 
with mean y, standard deviation (¢/+/n) the probability is .50 that z 
will lie in AB. Hence the probability is .50 that CD contains uz. 

Values of ki = .6745/+/n for n =2(1) 30, 40, 60, 120, ~ are presented 
in Table 1. 

To generalize, when the confidence coefficient is y, the confidence 
interval for yu is 

_ , bry 
sz o 


fn 


3.2 For the population mean, standard deviation unknown. Consider 
the case where the only information about ¢ is in the present sample. 





CONFIDENCE AND TOLERANCE INTERVALS 
Then the interval 


t.s0,n—-1 
Jin 

will contain yu, 50% of the time. The proof is sin.ilar to that of Section 
3.1. Values of ke=t.50,.n-1/-/n for n=2(1) 30, 40, 60, 120, © are pre- 
sented in Table 1. Comparison of k; and kz shows k2>k,, but asn— ©, 
kek. 

In general, when the confidence coefficient is y, the confidence 
interval is 


ih a thy ,n—1 P < » Zz re ti_y .n—1 
Vn Vn 

3.3 “Confidence interval” for second sample mean. Suppose the 
chemist who made the iron determinations wishes to set up a confidence 
interval, not for u, but for the mean, #2, of a second sample of nz ob- 
servations. Such an interval might be called more appropriately a 
prediction interval, since the term “confidence interval” generally 
refers to population parameters. 

Let us call the mean of the first sample #,, and its size m. Since the 
statistic 


qi — Zs 
1 1 


s4/—+— 
ny Ne 


is distributed as the Student-t ratio, it follows that the interval 
- ‘s. 8 
41 + tsoniA/ — + — a (2) 
NY Ne 


t= 





will constitute a 50% prediction interval for Z: [1]. 

What does this mean? It simply means that if pairs of samples of 
size 2, and nz respectively, with means Z; and Z; (i=1, 2,---) 
respectively are drawn repeatedly, then for 50% of these pairs #2; 
will lie in 

=. 
Fis + tson,iA/ — + — S1- 
nN Ne 


It does not mean that if one first sample of size n; with mean Z, is drawn, 
to be followed by the drawing of a great many “second” samples of 





554 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


TABLE 1 
FACTORS FOR 50% CONFIDENCE INTERVALS 








For 





o known (K) or 
unknown (U) 





Form of interval 








For discussion 
see Section 








CONFIDENCE AND TOLERANCE INTERVALS 555 


size nz with means #2; ({=1, 2---) then for 50% of the “second” 
samples Z2; will lie in (2). 
When n2=7,=7 the coefficient of s; in (2) becomes 


2 
ks = t50,n-1 —? 
n 


Values of k; for n=2 (1) 30, 40, 60, 120, © are given in Table 1 for 
purposes of comparison. Note that k3=+/2k: simply. 

In general, if the “confidence” coefficient is to be y for #2, then the 
interval to be used is 

; wnt 4 
% + b-+.0,-14/ — + — & 
nN Ne 

4. Tolerance intervals. In Section 3 an interval of type (1) was formed 
to contain the population mean (with a certain confidence). Suppose 
now, we are interested in setting up an interval of type (1) which will 
contain a certain proportion p of the population. Such an interval is 
known as a tolerance interval. 

If either » or o is unknown, then the interval (1), containing Z or s, 
is random. Hence the proportion p contained in (1) will be a random 
variable. 

4.1 Average value of p. In Section 4.1 we determine k so that on the 


average the proportion p is equal to a, a constant. In Section 4.2 we 
determine k so that in a large series of samples from normal universes 
a certain proportion y of the intervals (1) will include a proportion p 
or more of the universe. 

4.1.1 Population mean and standard deviation known. In this case 


ut ke (3) 


may be used as a tolerance interval. The proportion p contained in 
(3) is constant, and the appropriate value for specified p may be ob- 
tained from a table of normal areas. Thus for p=.50, ky=.6745 
(listed in Table 2 for purposes of comparison). 

4.1.2 Population mean and standard deviation unknown. Unfortu- 
nately, in most practical problems u and o are not known. Hence # and 
s must be used. How shall we determine k so that the average p con- 
tained in #;+ks; (¢=1, 2, - - - ) will be a? 

Wilks [8] gave a solution without presenting the details of the proof. 
(For an independent derivation see Appendix.) The solution states that 
the tolerance limits which will include, on the average, a proportion 
a of the normal universe are 





556 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


————_—_—_—_—_ — 


+1 
z + bonis/ 8. (4) 


n 


Values of kyo =tianiv/n+1/n for n=2 (1) 30, 40, 60, 120, © and for 
a=.50, .75, .90, .95, .99, .999 are given in Table 3. This table should 
be of use both to the experimenter and to the quality controller; it 
supplements the values of k given in [3]. In addition, for purposes of 
comparison, Table 2 gives values of ks,.s0 for n=2 (1) 30, 40, 60, 120, 
@, 

An example of the use of Table 3 is given: 

Example: A quality control engineer measures the voltages of a 
random sample of 30 batteries from his production line. From the 
sample mean voltage £=7.52 and the sample standard deviation of 
voltages s = .90, he wishes to estimate tolerance limits that will, on the 
average, contain 95% of the population of batteries. Assuming the dis- 
tribution of battery voltages to be normal, what shall these tolerance 
limits be? 

The tolerance limits will be of the form #+ks,.95s. Entering Table 3 
with n=30, he finds ks,9;=2.079. Hence the tolerance limits are: 


7.52 + 2.079(.90) = 7.52 + 1.87 = 5.65 to 9.39. 


Notice that ks, 95;=2.079 is larger than the value 1.96 that would be 
used if u and o were known. 

One sided tolerance limits. Suppose now the problem is to find the 
value of k/ such that, on the average, the proportion of the normal 
population less than +k s is a specified value a. By the same proce- 
dure as in the proof for the two sided case (Appendix) it may be shown 
that 


Kk’ = ks ,2a—1- (5) 


A similar result holds if the proportion of the normal population 
greater than —k s is to be a specified value a, on the average. 

Example: A pilot run of 40 electron tubes is made. For each tube the 
plate current in milliamperes, z, under normal operating voltages, is 
measured ; for the sample = 12.25, s=.68. From past experience with 
similar tubes, it is known that z is normally distributed. What pro- 
cedure shall be followed to determine the value of L such that 99% of 
the population of tubes will, on the average, have a plate voltage less 
than L? 

We may write 





CONFIDENCE AND TOLERANCE INTERVALS 
L= z + k.99’8. 
Then according to (5) 


K.99’ = ks ,2¢.99)-1 = ks, .98- 


Table 3 furnishes ks, 93=2.455. Hence 
L = 12.25 + 2.455(.68) = 13.92. 


4.1.3 Population mean unknown, standard deviation known. In this 
case an interval of the form 


E+ keo (6) 


must be used. Using the same method as in the proof given in the 
Appendix, the following result may be derived: 

If the expected value E(p) of the proportion p of the normal universe 
contained in (6) is to be a, then 


1 
io /— _—_ 
n 


For purposes of comparison, ke is given in Table 2 for a=.50 and for 
n=2(1) 30, 40, 60, 120, «. 

4.1.4 Population mean known, standard deviation unknown. In this 
case the interval 


b + kzs (7) 


must be used. Again using the same method as in the proof of the 


Appendix, the appropriate value for k; to include, on the average, a 
is given by 


ky = ti—s,n—1- 


For purposes of comparison, values of k; are given in Table 2, for a=.50 
and n=2 (1) 30, 40, 60, 120,. 

4.2 Confidence statement about tolerance interval. A number of papers 
have been written on the problem of confidence statements for toler- 
ance intervals [2], [3], [6], [7], [8], [9]. The problem may be illustrated 
as follows: 

4.2.1 Population mean and standard deviation unknown. Suppose 
the battery engineer mentioned in Section 4.1.2 asked the following 
question: What value of k shall I take so that I can be 95% confident 
that +ks will include at least 80% of my population of batteries? 

Bowker [8, pp. 102-107] gives extensive tables of k such that “in 











558 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


TABLE 2 
FACTORS FOR TOLERANCE INTERVALS 








that will include, on 

the average, 50% of 

the population. V JV J J 
or 

that will include at 

least 50% of the 

population 50% of 














the time. / 
» known (K) or K U U K K U 
unknown (U) 
o known (K) or K U K U U K 
unknown (U) 
Form of interval pwtko tks Ether pwtks ptkes Ftkeo 
n kg ks,.s0 ke ky ks ko 
2 .674 1.225 .826 1.000 1.000 .754 
3 .674 .942 .779 .816 .810 42d 
4 .674 .855 .754 .765 .759 .714 
5 .674 .812 .739 .741 .736 . 706 
6 .674 . 785 .729 .727 .723 .700 
a .674 .768 Ry 3 .718 .714 .697 
8 .674 754 715 sal .708 .694 
9 .674 .744 seal .706 .704 .692 
10 .674 7387 .707 .703 .701 .690 
11 .674 .731 .704 .700 .698 .688 
12 .674 .725 .702 .697 .698 .687 
13 .674 al .700 .695 .694 . 686 
14 .674 .718 .698 .694 .692 .686 
15 .674 715 .697 .692 .691 -685 
16 .674 712 .695 .691 .690 .684 
17 .674 .710 .694 .690 .689 .684 
18 .674 .708 .693 .689 .688 .683 
19 .674 .706 .692 .688 .687 -683 
20 .674 -705 .691 .688 .687 .682 
21 .674 .703 .690 .6§87 .686 .682 
22 .674 .701 .690 .686 .685 .681 
23 .674 .701 .689 .686 .685 .681 
24 .674 .699 .688 .685 .684 .681 
25 .674 .699 .688 .685 .684 .681 
26 .674 .697 .687 .684 .684 .680 
27 .674 .697 .687 .689 .683 .680 











R 1953 





a a a a ee a 











CONFIDENCE AND TOLERANCE INTERVALS 559 


TABLE 2 (cont.) 





n ky ks,.80 ke ky kg ko 
28 .674 .696 .686 .684 .683 .680 
29 .674 .695 .686 -683 .683 .680 
30 .674 .694 .686 683 .682 .680 
40 .674 .689 .683 .681 .680 .678 
60 .674 .685 .680 .679 .678 .677 
120 .674 .680 -.677 .677 .676 .676 
ce) .674 .674 .674 .674 .674 .674 





For discussion 
see Section 4.1.1 $3.2 4.1.3 4.1.4 4.2.2 


nS 
bo 
i) 





a large series of samples for normal universes a certain proportion + 
of the intervals + ks will include P or more of the universe; y is called 
the “confidence coefficient” since it is a measure of the confidence with 
which we may assert that a given tolerance range includes at least P 
of the universe.” In these tables y=.75, .90, .95, .99, .999. 

4.2.2 Population mean known, standard deviation unknown. Consider 
the case where u is known and 6 unknown. Then an interval of the form 


at kegs (8) 


can be set up to include at least a proportion P of the population with 
confidence vy as follows: 

Let us take specific values of P=.80 and y=.95 for illustrative 
purposes. We note first that p is monotonic increasing with s (and 
with s*?). Hence when s? takes on a value exceeded 95% of the time 
(call it 8.952), p will take on a value exceeded 95% of the time. But 

%06.0~1° 2 


S us 
= Se". 
n-—1 


Then the appropriate value of kg is 
kg = L20/-VX.95,n-1?/(n — 1). 


Values of kg for p= y =.50 for n=2 (1) 30, 40, 60, 120, ©. are given 
in Table 2, for purposes of comparison. 
For general P, y 








kg = Ly-p/ Vx? y,n-1/(n — 1). 


4.2.3 Population mean unknown, standard deviation known. In this 
case, an interval of the type 











560 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
£ + koo (9) 


must be used. Let us solve for kg when P=.80, y =.95 to illustrate the 
reasoning. 

We first note that 95% of the 2’s lie in the interval n+ (L.05/V/n)o 
that is, 95% of the €+k»o intervals have their centers inside the 
interval 1+ (L.05/+/n)o. Now we find kp such that the normal curve 
area between u+(L.o5/+/n)o—koo and u+(L.o5/+/n)o+keo is .80. Then 
95% of the + kee intervals will contain p= .80 (namely those intervals 
for which Z lies in p+ (L.05//n)o). 

It follows that the interval (9) will contain a proportion .80 or more 
of the population, .95 of the time. 

Values of ky for P= =.50 are given in Table 2 for n=2 (1) 30, 40, 
60, 120, ©. For general P, , ky is the value such that the normal curve 
area between p+(Li_,/+/n)o—keo and u+(Li_,/Vn)o+hge is P. 

5. Relationship between confidence intervals and tolerance intervals. 
There is a very interesting relationship between confidence intervals 
and tolerance intervals that may be illustrated by the following ex- 
ample: 

Suppose, as in Section 3.3, we wanted to find a prediction (or “confi- 
dence”) interval for the mean of a second sample. But now let n.=1. 
In other words, we will now be finding a confidence interval for a single 
future observation. According to the result in Section 3.3. our answer is 





1 1 m+1 
%1 + bi—e,n,-1 —t— Hh = % t hon / $1 (10) 
nN 1 mM 
where a is the confidence coefficient. 
What does this mean? One way of looking at it is that if repeatedly 
a sample of size 7, is first drawn and then a second sample of one item 
is drawn, then a proportion a of the time the single item will lie in the 
interval (10). But a little thought shows that this is exactly equivalent 
to stating that in repeated samples of size m, the average proportion, 
p, of the population contained in (10) is a. In other words, confidence 
limits with confidence coefficient a for a second sample of size one 
are identical with tolerance limits that will include a proportion a on 
the average. This is confirmed by the fact that (10) is the same as (4) 
(except for the subscript 1). 
The above is an illustration of a theorem by Paulson [5]: 
“If confidence limits U;(m, ... , 2.) and U2(m, ..., Zn) on a prob- 
ability level = ao are determined for g, a function of a future sample of 





% 1953 
(9) 
2 the 


/n)o 
the 
urve 
Then 
rvals 


more 


, 40, 
urve 


vals. 
rvals 
r @X- 


onfi- 
=]. 
ingle 
er is 


(10) 


edly 
item 
. the 
lent 
tion, 
ence 

one 
a on 


s (4) 


rob- 
le of 


CONFIDENCE AND TOLERANCE INTERVALS 


TABLE 3 


Z+ks«2 s WILL INCLUDE A PROPORTION a OF 
THE POPULATION, ON THE AVERAGE 


561 


FACTORS, ks. FOR TOLERANCE INTERVALS SUCH THAT 











Sample size,n ks, ks..2  s,.90 ks, .96 ks,.98 ks, .99 ks,.999 
2 1.225 2.957 7.733 15.562 38.973 77.964 779.699 
3 .942 1.852 3.372 4.969 8.042 11.460 36.486 
4 855 1.591 2.631 3.558 5.077 6.530 14.469 
5 812 1.473 2.335 3.041 4.105 5.043 9.432 
6 .785 1.405 2.176 2.777 3.635 4.355 7.409 
7 .768 1.361 2.077 2.616 3.360 3.963 6.370 
8 .754 1.330 2.010 2.508 3.180 3.711 5.733 
9 .744 1.307 1.961 2.431 3.053 3.536 5.314 
10 .737 1.290 1.922 2.372 2.959 3.409 5.014 

11 .731 1.276 1.893 2.327 2.887 3.310 4.791 
12 .725 1.264 1.869 2.291 2.829 3.233 4.618 
13 .721 1.255 1.849 2.261 2.782 3.170 4.481 
14 .718 1.246 1.833 2.236 2.743 3.118 4.369 
15 .715 1.239 1.819 2.215 2.710 3.075 4.276 
16 .712 1.234 1.807 2.197 2.682 3.038 4.198 
17 .710 1.228 1.797 2.181 2.658 3.006 4.131 
18 .708 1.224 1.788 2.168 2.637 2.977 4.074 
19 .706 1.220 1.779 2.156 2.618 2.953 4.024 
20 .705 1.216 1.772 2.145 2.602 2.932 3.979 
21 .703 1.213 1.766 2.1385 2.587 2.912 3.941 
22 .701 1.210 1.760 2.127 2.575 2.895 3.905 
23 .701 1.207 1.754 2.119 2.562 2.880 3.874 
24 .699 1.205 1.749 2.112 2.552 2.865 3.845 
25 .699 1.202 1.745 2.105 2.541 2.852 3.819 
26 .697 1.200 1.741 2.099 2.532 2.840 3.796 
27 .697 1.198 1.737 2.094 2.524 2.830 3.775 
28 .696 1.197 1.733 2.088 2.517 2.820 3.755 
29 .695 1.195 1.730 2.083 2.509 2.810 3.737 
30 .694 1.193 1.727 2.079 2.503 2.802 3.719 
40 .689 1.182 1.706 2.047 2.455 2.741 3.602 
60 .685 1.171 1.686 2.017 2.411 2.684 3.492 
120 .680 1.161 1.665 1.988 2.368 2.628 3.388 
© .674 1.150 1.645 1.960 2.326 2.576 3.291 








See Section 4.1.2 for a discussion of this case. 











b 


562 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


k observations, with distribution ¥(g), and p= Sev(9)d9, then E(p) 
=a.” 

In the illustration of this section, g corresponds to the value of the 
single future observation and k=1. Similarly we can check the results 
of Sections 4.1.3 and 4.1.4 by the use of Paulson’s theorem. 


APPENDIX 


Mathematical proof of Wilks’ result. The details of the derivation 
(independently obtained by I. R. Savage of the Statistical Engineering 
Laboratory, National Bureau of Standards) of the result of Section 
4.1.2 are given, since the method is a suggestive one. 

The problem is to determine k so that the average p contained in 
Z:tks; ({=1, 2,...) will be a. By an appropriate linear transforma- 
tion, the problem may be reduced to that of finding 


-) r-) Etks 
E(p) = Ci f f f eo} dign—%e-h Ind*+ (n—D" leds 
0 —wov £ 


—ks 


where C; is a constant free of k. In the following, C;=constant free of k. 
The conditions for differentiating under the integral hold. Hence 
we have 


2 “ 
= — os f f [se—3<+k=)? + ge @—ks)*] gn—2e—H Ind*+ (n—1)8" Id Eds 
0 —~2 
oo oe) 
= of f en BLY nF Bt (ke /VnF1))?+ (n—1+k? (n/n+1)0"] on—l eda 
0 —2 


io) io} 
+ C; f f ea (YnF1 a—(kalVnF1))?+(n—1+k? (ni n+1))0"1 enV zdg, 
0 —2 


o& . F 3 du 2 2 
—— = c.f f e~ iu ———— _ gp 1g—4h (n—1 +k" (n/n+1))8°dudgs 
ok > O«e 








Vn+1 
°2? , , ; 
+ c.f f ete ~ g—le—h(n—1+k (n[n+1))s duds, 
0 2 J/n + 1 
dE ” . : 
—_— = cs f gle} (n—1+k? (n/nt1) a7 Jig 
ak : 





n 
= —-1+k? 2, 
. a(n . eg 











t 1953 
E(p) 


the 
sults 


tion 
ring 
tion 


d in 


of k. 
ance 


3 








CONFIDENCE AND TOLERANCE INTERVALS 563 


Then 























65 C. fi /2—1))n/2—1 u( 1+ k? oe ya 
——— n/2—1,,n/2—1p— n- ; 
5k "Me ites aay ™ 
C 1 
~~ 3 n n/2 
até 
(x . <x", 
Hence 
h dk 
- n " 
(n -1+% ) 
n+1 
Now let 
n 
t= ka/ ’ r 
n+1 
so that 





Bo) = Cf dt 
PAZ (n= BP 


=o. f a/a+ a/v)". 


But the integrand is the well known Student-t density function. Now 
when k= ©, E(p)=1. Hence C; must be identical with the constant 
of the Student-t distribution. Therefore the result of Section 4.1.2 
follows: 


n+1 
k= temis/ + . 
n 





[1] Baker, G. A., “The probability that the mean of a second sample will differ 
from the mean of a first sample by less than a certain multiple of the standard 
deviation of the first sample,” Annals of Mathematical Statistics, 6 (1935), 
197-201. 

[2] Bowker, A. H., “Computation of factors for tolerance limits on a normal 
distribution when the sample is large,” Annals of Mathematical Statistics, 17 
(1946), 238-40. 











564 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


[3] Bowker, A. H., “Tolerance limits for normal distributions,” Chapter 2 of 
Statistical Research Group, Columbia University, Techniques of Statistical 
Analysis. New York: McGraw-Hill (1947), 95-110. 

[4] Mood, A. M., Introduction to the Theory of Statistics. New York: McGraw-Hill 
(1950), Chapter 11, 220-44. 

[5] Paulson, E., “A note on tolerance limits,” Annals of Mathematical Statistics, 
14 (1943), 90-3. 

[6] Wald, A., “Setting of tolerance limits when the sample is large,” Annals of 
Mathematical Statistics, 13 (1942), 389-99. 

[7] Wald, A., and Wolfowitz, J., “Tolerance limits for a normal distribution,” 
Annals of Mathematical Statistics, 17 (1946), 208-15. 

[8] Wilks, S. S., “Determination of sample sizes for setting tolerance limits,” An- 
nals of Mathematical Statistics, 12 (1941), 91-6. 

[9] Wallis, W. Allen, “Tolerance intervals for linear regression,” in Second 
Berkeley Symposium on Mathematical Statistics and Probability, edited by Jerzy 
Neyman. Berkeley: University of California Press (1951), 43-51. 





. 1953 


2 of 
stical 


-Hill 
stics, 
ls of 
ion, 

An- 


cond 
erzy 





A STATISTICALLY PRECISE AND RELATIVELY 
SIMPLE METHOD OF ESTIMATING THE BIO- 
ASSAY WITH QUANTAL RESPONSE, BASED 
ON THE LOGISTIC FUNCTION 


JosEPH BERKSON 
Mayo Clinic, Rochester, Minnesota 


TT” logistic function is given by 





1 
ait it 1 + e-(atie) a) 
Its straight line transform is 
RE ae (2) 


so that, if the logit P is plotted against x, the points will fall on a 
straight line, with a as the intercept and 8 the slope. 

The function (1) has had many statistical applications [37] and has 
been advanced for use in bio-assay by, among others, Emmens [17] 
Wilson and Worcester [46], and Berkson [7]. In bio-assay x measures 
the “dose”! and P the “response.” If the response is measured, not in 
terms of a continuous scale such as weight or length, but in terms of 
the observed proportion p affected out of n “exposed,” the response is 
said to be “quantal,” and in this statistical model it is assumed that 
the observation p at xz can be considered a random variable binomially 
distributed around the “true” P at x, with variance o,?= PQ/n. 

For this situation, the present writer has advanced a method [7] of 
calculating a and b, which are estimates of a and 8 respectively, pres- 
ently called the “minimum logit X’ estimate,” based on a minimization 
of the following quantity, called the “logit X*.” 


X*(logit) = 2) npg(l — 1)? (3) 


where n is the number exposed at x, p=1—q is the observed proportion 
affected, 1=In(p/q) is the logit of p, 1 =In(p/g) =a+bz is the logit of #, 
where # is the estimate of P at xz, given by (1) with estimates a, b re- 
placing the parameters, a, B. 

The minimum logit X° estimate has the following properties? 1. 
The logit X* (3) is distributed asymptotically as X*. 2. The estimate is 





1z is frequently the logarithm of the dose rather than the dose measured directly. 
2 See Appendix Note 2 for reference to other estimates. 


565 

















566 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


asymptotically efficient, and therefore as the number of animals at 
each dose n—>, the value of the variance of the asymptotic dis- 
tribution is minimum and given by 1/I, where I=E(6 In $/56)?, 
¢ being the probability of the total sample and 6 the parameter a or 8 
which is to be estimated. 3. It is sufficient and therefore, in the con- 
cept of Fisher, extracts the total amount of information available in 
the sample which is relevant to the estimated parameters [25]. 4. For 
finite samples (a) it has smaller sampling variance than 1/J, and (b) 
has smaller sampling error (mean square error) and smaller variance 
about the mean than the maximum likelihood estimate. These proper- 
ties hold for all values of the parameters [3, 4, 40]. 

The normal equations for obtaining the minimum logit X’ estimate 
of a and B are 


> npq(l — 1) = 0, (4) 


dX npgx(l — 1) = 0. (5) 


The evaluation of (4) (5) leads to a procedure that amounts simply to 
obtaining a least squares solution of the straight line 


l=a+t bz 


using npg as weight of the observation 1. The estimates are given by 


> npgl do npgx 
Dd npglz — 
D> npq 


(2 npqz)? 
2 mpage — Dd pq 


Dd npgql — b>) npgx 


? 
Dd pq 


where mean 1, J=)-npql/>-npq; mean x, =) npyx/>_ npg. 

It will be noted that the equations (6) (7) contain the quantities pq 
and pql, which are functions of p, the observed fraction affected. These 
have been tabled for p as argument giving w= pg and wl=pgql (Table 
3). The estimates then are 





De npg(t — I(w — #) _ 


b = 
> npq(x — 2)? 








(6) 








a=] — bi = 


(7) 


>, nul > nwz 
>> nw 

(>> nwz)? 
>> nw 





_SwG-pe-y eH 


a >> nw(z — 2)? 








(8) 
>> nwz? — 








1953 


at 
lis- 
§)?, 
r Bp 
on- 
in 
‘or 
(b) 
nce 
er- 


ate 


(4) 


(5) 
to 


(6) 


(8) 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 567 


> nuwl — b >> nwz 
a=1— bt = ; 
>> nw 
The E.D. 50=+7, the dosage value xz which produces 50 per cent ef- 


fect, is the value of z for which P=0.5, and is obtained by letting 
P=0.5 in (1), yielding 


(9) 





a 
7_-™= B ‘ 
The estimate of y, represented as 250, is given by 
a 
“~o=-— 7 . (10) 


The equations (8) (9) are explicit solutions of (4) (5) and, being en- 
tirely in terms of the observations, provide directly definitive estimates 
of a and B. 

This is in contrast with the method of probits using maximum like- 
lihood, as advanced by Bliss [10, 11] and by Finney [22] which, begin- 
ning with a provisional solution obtained graphically by eye, requires 
an iterative procedure, in which the maximum likelihood estimate is 
approached asymptotically as the number of iterative cycles is in- 
creased, but which in general does not actually attain the exact defini- 
tive maximum likelihood estimate.’ 

Tables giving the logit, 1, for argument p (Table 1),‘ the antilogit, p, 
for argument | (Table 2), and w= pg and wl=pgql for argument p (Ta- 
ble 3) are provided. Also illustrated are two graph papers (Fig. 1), 
which have been published! for the simple use of the straight-line trans- 
form, one with an arithmetically spaced scale for z and the other pro- 
viding also logarithmic scales of different total extent for different 
ranges of dosage. 


CALCULATION OF STANDARD ERRORS 


We may write the estimated logit linear equation as 
l=a+ be =a' + W(x — 2) 


where a’ =/=a+bé. 





*For further comments on difficulties involved with use of iterative procedures, see Appendix 
Note 1, 

4See Appendix Note 3, on the definition of logit. 

& Cost defrayed by the Mayo Foundation for Medical Education and Research. Sold by the Codex 
Book Company, Norwood, Massachusetts. 











AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 














GL° 19860° I S6E0I I TE60I'T S9PFIT'T 9001 T 9PSZI'T 28081 T oeger I SLIFI'T OZLPI*T 89ZST I to" 
92° 89ZSI°T ZI8S1°T 89E9T TI OZ69T*T PLPLI'I 62081 °T 988I°T SFI6I'T SOL6I*T 29202" I Tess" I £3" 
LL" TE806°T L6E1Z" 1 F96IZ'T €ESZs" I POTES* I 997° T ISZts' I LZ8F2' I LOPSS* I £86SZ°T 29992°T zs" 
82° 2999°T OST Ls" 9ELLZ°T FZEBS*T €1682°T £0962" T 8600€°T +6908 *T T6Z1€"T T681€°T €6hE "I 13° 
6L° £6FZE'T 96088 *T ZOLEE* I Olere I 1Z6€* I eesse’T SFI9E I ¢£9Z29e°T P8ELE I 90088 * T 6Z98E "I 03° 
08° 6928" I 9S268°T *8868° I SIsor't StIIt' I PSLIF'T ESbSF I £90EF I LOLEF'T ELEbr TI TOOSF * T 61° 
18° TO00SP* I ZSOSF'T 90E9F* I Z969F° T 1Z9Zb°T €8Z8F 1 8h68F°T S196F°T 98209 * I 69609" I SEegIs't 8I° 
z8° cegts't bIe7e I 96629" I 18989" T 69S" T 090¢¢° T PSLSS*T 1St99°T SSL 9gsl¢' e9¢ss* Zt° 
£8" e9gss° £21269" T 28669" T +0L09° T S219 6F1Z9°T 92829°T 20989°T Zterg' I T80¢9°T E2899" T 9I° 
8° £2899 °T 69299°T 8I&Z9°T &L089° T O€889° I T6S69°T LS€0L'T 9ZIIZ'T OO61L°T 8L9ZL°T O9FEL*T i 
s8° Orel’ T LdZbL TI Leos‘ essl’t ZE99L' I LEPLL'T 9FZ8L°T 69062 °T 82862" 10208° T 6ZS18°T *I° 
98° 6Z918°T Z9EZ8" 1 0028" I eb0rs I 26848 T SPLS8°T £0998 * T 69F28°T 6Eess" I S1Z68°T 96006* T €I° 
Z8° 96006 * T £8606° I 9Z816°T L226" I 08986°T T6S¢6°T 8096" T ZEF96' T £9€26°T 66286° I £4266" I éI° 
88° £4266 ° T €6100°S ISt10°% STIZ0°% 280£0°Z 99060°% Z2080°S 94090°% L¥0L0°% Z9080°% +2060 °Z It’ 
68° $2060 °Z OOTOI*Z eellt’s OLIZI'°S LEZET*S OBSHI°S ssesl’z Serol’z OZSL1°S 9T98I°S SZL61°S or’ 
06° SZL61°S 6E80Z°Z S961Z°S TOTES "SZ 8hZbS'S 90FS2°S PLE9S°S PELLS*S 9F682°Z 6F108°S e9ele"s 60° 
16° S9eles T6SZE°S Of8EE *Z €80Se °S ShE9E*S LE9LE°Z 02688 *Z LEZ0F SZ SHSIF'S P8BZF SZ SESHES 80° 
26° SEShP'S TO9SF*Z P869F'°S Z8E8F SZ 86266 °S T€ZIg°s T89Z9°% 6FIFS°Z Leoss*s EPILs"z 69989 °% 40° 
£6° 69989 °S $1Z09°Z €8L19°% TLee9°% 28649 °S 91999°S €2Z89°% $9669°Z Z99TL°S P6EEL*S PSISL’S 90° 
¥6° PSISL’S T¥692°% 9S282°Z 10908 °% LLbZ8°% S8EF8"Z 9E98°Z T0888 °Z T1806°% 8SE26'°Z brPEG Z ¢0° 
¢6° PEPE ZS 69996 °Z 98286 °% 24600 °€ Z0ZE0° sosso’s 2S820°€ 09Z0I*€ SIZZI°€ SESST ES SOBLI'€ 40° 
96° SO8ZI*€ T¥H0S'€ Evles'e PI6SS°E LSL8Z°E SL9TE"S O89FE "Ef 69LLE°E OS60F*€ 82ZrP' € OL9Lh's £0° 
26° O19Lb' eolis’s SILbS°€ Sotss’é T&&Z9°€ 99€99°¢ T¥S0L°€ 668FL°E LPP6L’E 10248 *€ Z8168°E z0° 
86° 28168 °S Elbh6's 22666 °€ OFLS0°F FOBIT' > 6SP8I°F 09692 °F L628 °F SLOLb'F O886F °F ZIS6S°F 10° 
66° SI1S6o°F 6F10L°F 82028 °F Z8hS6'F 86601 °S 08862 °¢ SPLis’s F1908°S 19Z1Z°9 $2906°9 = 00° 
6 8 Z 9 g b € z I 0 
d 
UUINJOD 4j9] Ul d 103 ‘syypuBSsNoY,y, 












































“@AT}ISOd SI 41FO] “QYBU UO OG" uUBYy 1038013 d 10q “vATYeBoU st 41F0] “4Jo] UO OG" UYy sse_ d 10g 


SLIDOT 
T ATaVL 


568 


#* 9@8 #2 +8 


aetatt 


= ee Senos oe 
= ’ 


ee ee 





i=2) 
© 
ue 


ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 








auinjoo 4431 ul d@ 103 ‘sq}puvsnoy,y, 
























































1953 








S6E0I “IT 


Aw@aae se @ 


meneeece 


pBennrt>s 


RnoORat*T 


paraw-?r 





mn»~eorer 


,aernea+T 


Toon’ 


0 T 3 € ¥ g 9 Z 8 6 
00000°0 00+00°0 00800°0 00Z1T0°O 009T0"O 00020°0 00+%20°0 00820°0 00Z€0°0 009£0°0 T00%0°O 6° 
TO00+0°0O TOO TO8t0°9 10ZS0°0 T09S0°O 20090°0O 20+90°0 £0890°0 £0220°0 40920°0 ¥0080°0 8F° 
+0080°0 0*80°0 90880°0 90260°0 40960°0 80001 *O 60401 °O TISOr‘O ZIZIl'O SI9It"O FIOZT'O lv’ 
FIOZT*O 9IFZI°O SI8ZI°O 61ZET "0 Tz9eT°O €Z0FT 0 SZPPL'O L28h1°0 6Z2S1°O Se9st'o vEOST’O 9F° 
PEOST"O LZEPv9T*O OF89T°O SbZL1'°O 9F9LT*O 6F08T*O SSP8I°O 9S88T°O 6SZ61'°0O £9961 °O £9002°0 St 
£9002°0 T2402 °0 ¢$2802°0 O8ZIZ°0 S89IZ°0 68022 '°0 F6F22'0 00622 °0 S0€EZ°0 OTLEZ"0 9TIFS°O La 
9IItZ°O ZZSbZ°0 826620 seesz"0 T¥2ZS2°0 8F19Z°0 ¢gs9z"0 29692 °0 0LE22°0 LLLLZ°0 S818Z°0 a 
S818Z°0 £6982" 0 20062 °0 O1¥62°0 61862 °0 82208 '0 4£908°0 LOE 0 ZSt1e°0 Z98T€°0O LL2ZE'°0 4 
LLZZE°0 889Z8°0 660€€°0 orsee*0 26EE°O Seere'o StLbe'o ssise’o ozsse*oO €86S¢°0 26e9¢°0 Ir 
L6£9€°0 ors9e"0 b22LE°0 se9le°0 es08e"0 L9F8E°0O sss 'o 86268 °0 PI L6E°O O&10F'°O L¥S0F°0 OF" 
LtSOr*0 £960F 0 I8eIh'O 86L1F'O 91ZZb'0 PEIZE'O eSO0er 0 oLPEr'O T68£r°O Tlebh'O TeZtr’oO 6° 
TeZPr'o SsISb'O ELssr'o 66S 'O 91t9F°O 8E89r'0 09Z2t'0 €892b'0 LOI8b'O Tessr’o SS68h°0O 8¢° 
Sc68r 0 6LE6h°0 SO86F*O 020° 0 9¢90S°0 €801s"O 60STS"O ZE61¢°0 ¢98Zs"0 £6229 °0 28S 0 Z¢° 
S2ZES 0 Ts9es "0 T80¢S°0O IIgts"O Zh6ES'O e2ess"0 FOSSs "0 2€z9¢°0 69999 °0 £01Zs°0 9egzs°0 98° 
9€S2¢°0 1TZ6Z¢°0 9048S "0 T¥889°0 LLZ6S°0 €126¢°0 Ost09°0O 88S09°0 9Z019°0 S9F19°O 40619°0O se" 
+0619°O +¥E79°0 ¥8229°0 S2Ze9°0 499€9°0 601+9°0 ZSSt9°O £6649°0 6EFS9°O ¥88°9°0 6Z£99°0 ts" 
62£99°0 $2L99°0 &2229°0 69929°0 L1189°0 99989°0 ¢$1069°0 ¢9%69°0 $1669°0 29£0L°0 6180L°0 €e° 
61802°0 TZ212°0 FZLIL°O 6L1Z2°0 €€9Z2°0 68082°0 StSEl'O Z200%L°0 O09FFL°0 8I6bL°0 LLESL°0 se" 
LLES2°0 Zescl'o 86292°0 69292°0 2ZLL°0 ¢89LL°0 8FI82°0 €1982°0 62062°0 StS6L°0 21008 °0O 1° 
21008 °0 08408 "0 6+608°0 SIFI8'O 68818°0 09€28°0 ZE8z8 0 csoees "0 6LLE8°0 bSZt8'0 0€2¥8°0 08° 
O0&Zt8°0 9028" 0 $8968 °0 Z9198°0 249980 ZZ1L8°0 +0928 °0 980886 69°88 °0 £9068 "0 8°68 °0 62° 
8£°68°0 £2006 °O Z1S06°0O 00016°0O 68F16°0 62616°0 12+26'0 £9626 '0 9SF€6°0 1S6€6°0O 9FFF6°0 82" 
9FFF6'0 €F6F6°0 OFFS6'O 68696 '°0 6£496°0 0+696'°0 ZtL6°0 £t626°0 OSt86°0 ¢$°686'°0 29%66°0O lz" 
29*66°O 02666°0 62400°T 06600° T TOSto°T F10Z0° I 8ZSZ0'T €F0€0° 1 @9¢SE0'T 820+0°T L6S40°T 92° 
L6St0°T ZIISO'T 6£9°0°T Z9190°T 98990" T Z1ZL0° 1 6€220°T 89280' I 46280°T 6Z£60° T 19860°T So" 
6 8 ZL 9 g + € z T 0 
d 
UUIN]OS 4j9] UI d 103 ‘sey pUBsNOY], 
se EEE AE RS ESN AERA ASAT ME oncom 
l | 

19860° T TE60I IT | S9FIT'T | 90061 *T 9PSZI'T | Z80€1°T Oe9etT SLIFI'T OZLPI'T 89ZST I to" 


owe 





5 


Ld 


0 


ANTILOGITS 


TABLE 2 


Entries give value of p for specified positive value of logit 1; 
if 1 is negative, p is 1 minus the tabled value. 

















i 0 1 2 | 3 | | 5 |} «| 7] so 
0.0 50000 | .50250 | .50600 | .60750 | .61000 | .51250 | .51500 | .51749 | .51990 | .52248 
0.1 s24os | .52747 | .52096 | .83245 | .53404 | .53743 | .53001 | .54240 | .54488 | .54736 
0.2 54983 | .55231 | .55478 | .55725 | 85971 | .56218 | .56464 | .56700 | .56955 | .57200 
0.3 “57444 | .57680 | .57932 | .88176 | .58419 | .58662 | .58004 | .59146 | .59387 | .50628 
0.4 “59860 | .00100 | .60348 | .60587 | .60826 | .61064 | .61301 | .61538 | .61775 | .62011 
0.5 62246 | .2481 | .62715 | .62048 | .03181 | .63414 | .63645 | .63876 | .64107 | .64337 
0.6 “64566 | .64704 | .05022 | .65249 | .65475 | .05701 | .65026 | .66150 | .66374 | .66597 
0.7 “66810 | .67040 | .67261 | .67481 | .67700 | .67018 | .68135 | .68352 | .68568 | .¢8783 
0.8 “ea907 | .69211 | .60424 | .69635 | .69847 | .70087 | .70266 | .70475 | .70082 | .70889 
0.9 71095 | .71300 | .71504 | .71708 | .71910 | .72112 | .72312 | .72512 | .72711 | .72909 
1.0 .73106 | .73302 | .73497 | .73602 | .73885 | .74077 | .74269 | .74460 | .74e49 | .74838 
1.1 "75026 | .75213 | .75399 | .75584 | .75768 | .75951 | .76133 | .76315 | .76495 | .76674 
1.2 "7852 | .7030 | .77206 | .77382 | .77556 | .77730 | .77903 | .78074 | .78245 | .78415 
1.3 "78583 | .78751 | .78018 | .79084 | .79249 | .70413 | .70576 | .79738 | .79809 | 80050 
1.4 “so21s | .80377 | .80534 | .80600 | .80845 | .81000 | .81153 | .81306 | .81457 | .81608 
1.5 81757 | .81908 | .82054 | .2201 | .82346 | .s24o1 | .82635 | .8278 | .82020 | .83062 
1.6 “83202 | .83341 | .83480 | .83617 | .83753 | .838s0 | .84024 | .84158 | .84200 | .84422 
1.7 “24553 | .84684 | .84813 | .84941 | .85060 | .a5195 | .85321 | .85446 | .85570 | .85693 
1.8 “85815 | .85936 | .86057 | .86176 | .86295 | .86413 | .86530 | .86646 | .86761 | .86876 
1.9 geogo | .87102 | .87214 | .87325 | .87435 | .87545 | .87653 | .8761 | .87868 | .87974 
2.0 .sgoso | .98184 | .88288 | .8301 | .88403 | .88505 | .98605 | .88705 | .88804 | .88003 
2.1 “g0090 | .89187 | .80283 | .80379 | .g0473 | .80567 | .80660 | .80752 | .s0844 | .89935 
2.2 "90025 | .90114 | .90203 | .90201 | .90378 | .90465 | .90551 | .90636 | .90721 | .90805 
2.3 “ooses | .90970 | .91052 | .91133 | .91214 | .91203 | .91373 | .91451 | .91529 | .91608 
2.4 91683 | .91759 | .91834 | .91909 | .91983 | .92056 | .92129 | .92201 | .92273 | .92344 
2.5 g2a14 | .92484 | .92553 | .92622 | .92600 | .92757 | .92824 | .92801 | .92056 | .93022 
2.6 93086 | .93150 | .93214 | .93277 | .93330 | .93401 | .93402 | .99523 | .98584 | .93643 
2.7 93703 | .93761 | .93820 | .93877 | .93985 | .93901 | .94048 | .94103 | .94159 | .04213 
2.8 9426s | .4321 | .94375 | .94428 | .94480 | .o4532 | .94583 | .94634 | .94085 | .04735 
2.9 94785 | .94834 | .04883 | .94931 | .94979 | .95026 | .95073 | .95120 | .95166 | .95212 
3.0 95257 | .95302 | .95347 | .95301 | .95435 | .95478 | .95521 | .95564 | .95006 | .95048 
3.1 “95080 | .95730 | .95771 | .95811 | .95851 | .95891 | .95930 | .95069 | .96007 | .96046 
3.2 “96083 | 96121 | .96158 | .96195 | .96231 | .96267 | .96303 | .96339 | .90374 | .96408 
3.3 “96443 | .96477 | .96511 | .96544 | .96578 | .96610 | .96643 | .96675 | .96707 | .96739 
3.4 9670 | .96802 | .96832 | .96863 | .96893 | .96023 | .96053 | .96982 | .97011 | .97040 
3.5 97060 | .97097 | .97125 | .97153 | .97180 | .97208 | .97235 | .97262 | .97258 | .97314 
3.6 “97340 | .97366 | .97302 | .97417 | .97442 | .97467 | .97401 | .97516 | .97540 | .97504 
3.7 “97587 | .97611 | .97634 | .97657 | .97680 | .97702 | .9725 | .97747 | .97769 | .97790 
3.8 “o7si2 | .97833 | .97854 | .97875 | .97896 | .o7016 | .97937 | .97957 | .97977 | .97096 
3.9 “ogo16 | .98035 | .98054 | .98073 | .98092 | .98111 | .98129 | .98148 | .98166 | .98184 
4.0 98201 | .98219 | .98236 | .98254 | .98271 | 98288 | .98304 | .98321 | .98337 | .98354 
4.1 “98370 | .98386 | .98402 | .98417 | .98433 | .98448 | .98463 | .98478 | .98493 | .98508 
4.2 “98523 | .98537 | .98551 | .98566 | .985g0 | 98504 | .98607 | .98621 | .98635 | .98648 
4.3 “98661 | .98674 | .98687 | .98700 | .98713 | .98726 | .98738 | .98751 | .98763 | .98775 
4.4 "98787 | .98790 | .98811 | .98823 | .98834 | .98846 | .98857 | .98868 | .98879 | .98890 
4.5 98901 | .98012 | .98023 | .98933 | .98044 | .98054 | .98065 | .98075 | .98085 | .98005 
4.6 “99005 | .99015 | .90024 | .99034 | .99043 | .99053 | .99062 | .90071 | .o9081 | .99090 
4.7 ‘90090 | .90108 | .90116 | .99125 | .90134 | .99142 | .90151 | .90150 | .99167 | .99176 
4.8 “9184 | .99192 | .90200 | .99208 | .99215 | .99223 | .99231 | .90239 | .99246 | .99253 
4.9 “99261 | .90268 | .99275 | .99283 | .00290 | .90297 | .99304 | .99810 | .90317 | .99324 
i rere. 882 Ore. es 












































TABLE 3 571 
LOGISTIC WEIGHTS 
Upper figure is w= pq; lower figure is wl = pql. 


For p less than .50 on left, wl is negative. For p greater 
than .50 on right, wl is positive. 

















er Thousandths, for p on left 
248 P 
m0 0 1 2 3 4 | 5 | 6 7 8 9 
vm 00 0000 | .0010 | .0020 | .0030 | .0940 | .0050 | .0060 | .0070 | .0079 | .0089 | .0099 .99 
011 — | 9069 | .o124 | .0174 | .0220 | .0263 | .0305 | .0344 | .0383 | .0419 | .0455 
.01 0099 | .0109 | .0119 | .0128 | .0138 | .0148 | .0157 | .0167 | .0177 | .0186 | .0196 .98 
1337 0485 | .0489 | .0523 | .0556 | .0587 | .0618 | .0649 | .0678 | .0707 | .0735 | .0763 
5597 02 | .0196 | .0206 | .0215 | .0225 | .0234 | .0244 | .0253 | .0263 | .0272 | .0282 .0291 | .97 
3783 “0763 | .0790 | .0816 | .0842 | .0868 | .0893 | .0918 | .0942 | .0965 | .0989 | .1012 
1889 .03 “9291 | .0300 | .0310 | .0319 | .0328 | .0338 | .0347 | .0356 | .0366 | .0375 | .0384 .96 
2909 "y012 | .1034 | .1056 | .1078 | .1009 | .1120 | .1141 | .1161 | .1181 | .1202 | .1220 
.04 “9384 | 0393 | .0402 | .0412 | .0421 | .0430 | .0439 | .0448 | .0457 | .0466 | .0475 .95 
1838 "1220 | .1239 | .1258 | .1277 | .1205 | .1313 | .1331 | .1348 | .1365 | .1382 | .1399 
soe 05 | .0475 | .0484 | .0493 | .0502 | .0511 | .0520 | .0529 | .0538 | .0546 .0555 | .0564| .94 
0059 "3399 | .1415 | .1431 | 1447 | .1463 | .1478 | .1493 | .1508 | .1523 | .1538 | .1552 
1608 .06 "0564 | .0573 | .0582 | .0590 | .0599 | .0608 | .0616 | .0625 | .0634 | .0€42 | .0651 .93 
"1552 | .1566 | .1580 | .1504 | .1607 | .1620 | .1633 | .1646 | .1659 | .1672 | .1684 
3062 07 "0651 | .0660 | .0668 | .0677 | .0685 | .0694 | .0702 | .0711 | .0719 | .0728 | .0736 
4422 "4684 | 1606 | .1708 | .1720 | .1731 | .1743 | .1754 | .1765 | .1776 | .1787 | .1798 
5693 .08 0736 | .0744 | .0753 | .0761 | .0769 | .0778 | .0786 | .0794 | .0803 | .0811 | .0819 91 
6876 "1798 | .1908 | .1818 | .1828 | .1838 | .1848 | .1858 | .1867 | .1877 | .1886 | .1895 
7074 .09 “9819 | .0827 | .0835 | .0844 | .0852 | .0860 | .0868 | .0876 | .0884 | .0892 | .0900 90 
1895 | .1904 | .1913 | .1921 | .1930 | .1938 | .1946 | .1954 | .1962 | .1970| . 1977 
a 10 0900 | .0908 | .0916 | .0924 | .0932 | .0940 | .0948 | .0956 | .0963 | .0971 | .0979 89 


0805 1977 | .1985 | .1992 | .2000 | .2007 | .2014 | .2021 | .2027 . 2034 | .2040 | .2047 









































1606 11 | .0979 | 0987 | .0995 | .1002 | .1010 | .1018 | .1025 | .1033 | .1041 | .1048 | .1056 | .88 
0344 “9047 | .2053 | .2059 | .2065 | .2071 | .2077 | .2083 | .2088 | .2093 | .2099 | .2104 
12 | .1056 | .1064 | .1071 | .1079 | .1086 | .1094 | .1101 | .1109 | .1116 | .1124 | .1131 | .87 
3022 “a104 | .2109 | .2114 | .2119 | .2124 | .2128 | .2133 | .2137 | .2142 | .2146 | .2150 
3643 13 | 1131 | .1138 | 1246 | 1283 | 1160 | .1168 | .1175 | .1182 | .1190 | .1197 | .1204 | .86 
213 9150 | .2154 | .2158 | .2162 | .2165 | .2169 | .2173 | .2176 | .2179 | .2182 | .2186 
14735 14 | 1904 | 1201 | 1218 | 1226 | .1233 | .1240 | .1247 | .1254 | .1261 | .1268 | .1275 | .85 
6012 “a186 | .2189 | .2192 | .2194 | .2197 | .2200 | .2202 | .2205 | .2207 | .2209 | .2212 
15648 15 | .1275 | 1282 | 1289 | .1296 | .1303 | .1310 | .1317 | .1324 | .1330 | .1337 | .1344 | .84 
50046 "ga12 | .2214 | .2216 | .2218.| .2219 | .2221 | .2223 | .2224 | .2226 | .2227 | .2229 
56408 16 | .1344 | .1351 | .1358 | .1364 | .1371 | .1378 | .1384 | .1301 | .1308 | 1404] .1411) — -83 
96739 "9999 | .2230 | .2231 | .2232 | .2233 | .2234 | .2235 | .2236 | .2236 | .2237 | .2237 
97040 ar | agua | 1408 | 1424 | .243a | 2497 | 1444 | 1450 | .1457 | .1463 | .1470 | .1476 | -82 
"9937 | .2238 | .2238 | .2238 | .2239 | 2239 | .2239 | .2239 | .2239 | .2238 | .2238 
97314 18 | 1476 | .1482 | .1480 | .1495 | .1501 | .1508 | .1514 | .1520 | .1527 | 1533 | 1530) .81 
97564 "9038 | .2238 | .2237 | .2237 | 2236 | .2236 | .2235 | .2234 | .2233 | .2233 | .2232 
9790 19 | .1530 | .1545 | .1551 | .1558 | .1564 | .1570 | .1576 | .1882 | .1588 | .1504 | .1600) — .80 
97996 "ga32 | .2231 | .2229 | .2228 | .2227 | .2226 | .2224 | .2223 | .2221 | .2220 | .2218 
_ 20 | .1600 | .1606 | .1612 | .1618 | .1624 | .1680 | .1686 | .1642 | .1647 | .1653 | .1659 | .79 
98354 “9018 | .2216 | .2215 | .2213 | .2211 | .2209 | .2207 | .2205 | .2203 | .2200 | .2198 
98508 a1 | 1659 | .1665 | .t671 | .1676 | .1682 | .1688 | .1603 | .1699 | .1705 | 1710 | .1716 | .78 
98648 "9198 | 2196 | .2193 | .2191 | .2188 | .2186 | .2183 | .2180 | .2178 | .2175 | .2172 
98775 a2 | .1716 | .1722 | .1727 | .1733 | .1738 | .1744 | .1749 | 1755 | .1760 | .1766 | 1771) 77 
98890 "9172 | .2169 | .2166 | .2163 | .2160 | .2157 | .2153 | .2150 | .2147 | .2143 | .2140 
23 | 17m | .1776 | .1782 | .1787 | .1792 | .1798 | .1803 | .1808 | .1814 | .1819 | .1824 | .76 
98995 "a140 | .2136 | .2133 | .2120 | .2126 | .2122 | .2118 | .2114 | .2110 | .2106 | .2102 
‘99090 24 | 1824 | .1829 | .1834 | .1840 | .1845 | .1850 | .1855 | .1860 | .1865 | .1870 | .1875 | .75 
99178 “e102 | .2098 | .2004 | .2090 | .2086 | .2082 | .2078 | .2073 | .2069 | .2064 | .2060 
99253 
99324 a 8 7 | 6 | & 4 3 2 1 0 





Thousandths, for p on right 




















572 


For p less than .50 on left, wi is negative. For p greater 
than .50 on right, wl is positive. 


TABLE 3 (cont.) 


LOGISTIC WEIGHTS 
Upper figure is w = pq; lower figure is wl = pql. 








Thousandths, for p on left 















































P 
0 1 2 3 4 5 6 7 8 9 

25 .1875 | .1880 | .1885 | .1890 | .1895 | .1900 | .1905 | .1910 | .1914) . 1919 | .1924 74 
2060 | .2055 | .2051 | .2046 | .2041 | .2037 | .2032 | .2027 | .2022 -2017 | .2012 

26 1924 | .1929 | .1934 | .1938 | .1943 | .1948 | .1952 | .1957 | .1962 | . 1966 | .1971 -73 
2012 | .2007 | .2002 | .1997 | .1992 | .1987 | .1982 | .1976 | .1971 | . 1966 | .1960 

27 .1971 | .1976 | .1980 | .1985 | .1989 | .1994 | .1998 | .2003 | .2007 | . 2012 | .2016 72 
1960 | .1955 | .1949 | .1944 | .1938 | .1933 | .1927 | .1921 | .1916 | . 1910 | .1904 

-28 2016 | .2020 | .2025 | .2029 | .2033 | .2038 | .2042 | .2046 | .2051 -2055 | .2059 ae | 
.1904 | .1898 | .1892 | .1886 | .1880 | .1874 | .1868 | .1862 | .1856 | . 1850 | .1844 

29 2059 | .2063 | .2067 | .2072 | .2076 | .2080 | .2084 | .2088 | .2092 | . 2096 | .2100 -70 
1844 | .1837 | .1831 | .1825 | .1818 | .1812 | .1805 | .1799 | .1792 | . 1786 | .1779 

-30 2100 | .2104 | .2108 | .2112 | .2116 | .2120 | .2124 | .2128 | .2131 -2135 | .2139 -69 
1779 | .1773 | .1766 | .1759 | .1753 | .1746 | .1739 | .1732 | .1725 | . 1718 | .1711 

31 .2139 | .2143 | .2147 | .2150 | .2154 | .2158 | .2161 | .2165 | .2169 2172 | .2176 -68 
(1711 | .1704 | .1697 | .1690 | .1683 | .1676 | .1669 | .1662 | .1€55 | . 1647 | .1640 

-32 2176 | .2180 | .2183 | .2187 | .2190 | .2194 | .2197 | .2201 | .2204 -2208 | .2211 -67 
.1640 | .1633 | .1626 | .1618 | .1611 | .1603 | .1596 | .1588 | .1581 | . 1573 | .1566 

-33 2211 | .2214 | .2218 | .2221 | .2224 | .2228 | .2231 | .2234 | .2238 | . 2241 | .2244 - 66 
1566 | .1558 | .1551 | .1543 | .1535 | .1527 | .1520 | .1512 | .1504| . 1496 | .1488 

34 2244 | .2247 | .2250 | .2254 | .2257 | .2260 | .2263 | .2266 | .2269| . 2272 | .2275 -65 
1488 | .1481 | .1473 | .1465 | .1457 | .1449 | .1441 | .1433 | .1425 | . 1416 | .140° 

35 2275 | .2278 | .2281 | .2284 | .2287 | .2290 | .2293 | .2296 | .2298 | . 2301 | .2304 64 
.1408 | .1400 | .1392 | .1384 | .1376 | .1367 | .1359 | .1351 | .1342| . 1334 | .1326 

-36 2304 | .2307 | .2310 | .23i2 | .2315 | .2318 | .2320 | .2323 | .2326 | . 2328 | .2331 -63 
.1326 | .1317 | .1309 | .1300 | .1292 | .1283 | .1275 | .1266 | .1258 | . 1249 | .1241 

-37 2331 | .2334 | .2336 | .2339 | .2341 | .2344 | .2346 | .2349 | .2351 -2354 | .2356 -62 
1241 | .1232 | .1223 | .1215 | .1206 | .1197 | .1189 | .1180 | .1171 -1162 | .1153 

-38 2356 | .2358 | .2361 | .2363 | .2365 | .2368 | .2370 | .2372 | .2375 | . 2377 | .2379 -61 
1153 | .1145 | .1136 | .1127 | .1118 | .1109 | .1100 | .1091 | .1082 1073 | .1064 

-39 .2379 | .2381 | .2383 | .2386 | .2388 | .2390 | .2392 | .2394 | .2396 | . 2398 | .2400 -60 
1064 | .1055 | .1046 | .1037 | .1028 | .1019 | .1010 | .1001 | .0991 -0982 | .0973 

-40 2400 | .2402 | .2404 | .2406 | .2408 | .2410 | .2412 | .2414 | .2415 -2417 | .2419 59 
.0973 | .0964 | .0955 | .0945 | .0936 | .0927 | .0918 | .0908 | .0899 0890 | .0880 

41 2419 | .2421 | .2423 | .2424 | .2426 | .2428 | .2429 | .2431 | .2433 2434 | .2436 58 
.0880 | .0871 | .0862 | .0852 | .0843 | .0834 | .0824 | .0815 | .0805 -0796 | .0786 

42 2436 | .2438 | .2439 | .2441 | .2442 | .2444 | .2445 | .2447 | .2448 -2450 | .2451 57 
.0786 | .0777 | .0767 | .0758 | .0748 | .0739 | .0729 | .0720 | .0710 0700 | .0691 

43 2451 | .2452 | .2454 | .2455 | .2456 | .2458 | .2459 | .2460 | .2462 | . 2463 | .2464 -56 
0691 | .0681 | .0672 | .0662 | .0652 | .0643 | .0633 | .0623 | .0614 -0604 | .0594 

44 2464 | .2465 | .2466 | .2468 | .2469 | .2470 | .2471 | .2472 | .2473 | - 2474 | .2475 55 
.0594 | .0584 | .0575 | .0565 | .0555 | .0546 | .0536 | .0526 | .0516 | . 0506 | .0497 

45 2475 | .2476 | .2477 | .2478 | .2479 | .2480 | .2481 | .2482 | .2482 | . 2483 | .2484 54 
.0497 | .0487 | .0477 | .0467 | .0457 | .0448 | .0438 | .0428 | .0418 | . 0408 | .0398 

-46 2484 | .2485 | .2486 | .2486 | .2487 | .2488 | .2488 | .2489 | .2490 | . 2490 | .2491 53 
.0398 | .0388 | .0379 | .0369 | .0359 | .0349 | .0339 | .0329 | .0319 0309 | .0299 

47 2491 | .2492 | .2492 | .2493 | .2493 | .2494 | .2494 | .2495 | .2495 | . 2496 | .2496 52 
.0299 | .0289 | .0279 | .0269 | .0260 | .0250 | .0240 | .0230 | .0220 -0210 | .0200 

48 2496 | .2496 | .2497 | .2497 | .2497 | .2498 | .2498 | .2498 | .2499 | . 2499 | .2499 51 
.0200 | .0190 | .0180 | .0170 | .0160 | .0150 | .0140 | .0310 | .0120 -0110 | .0100 

49 2499 | .2499 | .2499 | .2500 | .2500 | .2500 | .2500 | .2500 | .2500 2500 | .2500 -50 
.0100 | .0090 | .0080 | .0070 | .0060 | .0050 | .0040 | .0030 | .0020 -0010 | .0000 

9 8 7 6 5 4 3 2 1 0 
P 








Thousandths, for p on right 











— = FS Af 


67 


66 


64 


52 


51 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 573 


If x is measured as the logarithm of the “dose” D, then z50=log Dsa, 
where 250 is the estimate of y, the value of x corresponding to a 50 per 
cent response, and Dspo is the estimate of the actual dose producing this 
response. Formulas for variances of the estimates of the parameters 
may be written as follows* 








1 1 
87, = s? = 
> nw , >> nw(z — £)? 
s?, = 874° + 27s", (11) 


87x59 = — [82a + 8x(t50 — Z)?] 8 d5g = 8225 (—“). 
b? log e 

These formulas provide closely accurate estimates of the variances, 
under ideal conditions in which (a) the “true” P’s are given exactly 
by equation (1), (b) the samples are “random” at each dose, the doses 
themselves being fixed quantities, and (c) the number of animals used 
at each dose is large. 

The conditions (a) and (b) can be maintained with satisfactory simil- 
itude in sampling experiments, set up with the use of random numbers 
or the like, mechanical shuffling of cards which have been appropri- 
ately prepared, or where equivalent experimental conditions have been 
deliberately and carefully arranged [4]. However, in the experiments 
with bio-assay as actually performed in the laboratory, this is impossi- 
ble. In the first place, the “assumption” that the “true” P’s follow 
exactly some specified function such as (1), or the equivalent statistical 
“assumption” that the sampled p’s approach these P’s with probability 
approaching 1, as n—~, is, of course only an idealization, employed 
to establish a working model, and can be expected, at best, to be only 
approximately true in fact. But more importantly, even if this ap- 
proximation is close enough to be considered precise, each of the many 
different manipulations involved in accomplishing a bio-assay, such as 
the preparation of specified dosage concentrations, the administration 
of the drug (for instance by spraying of insecticides or injection of toxic 
drugs into individual animals), as well as the unstable behavior of the 
animals from instant to instant, results in variations that have the 
same effect as “errors.” These errors, even when no animal experi- 
ments are involved, are frequently large [2]. All of them influence to 
a small or great degree the variation of the bio-assay, and their net 





6 They may be derived as the asymptotic variances with estimates then substituted for the param- 
eters [13, 45]. In the case of sp», the formula is based on the frequently used approximation 
o(ln x) =1/%or, where # represents an average. 











574 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


effect is to increase the error of the assay beyond the values given by 
the formulas (11). 

As respects the condition (c), in the practice of bio-assay the numbers 
of animals usually are not large, and even aside from the errors of 
dosage referred to, the formulas (11), since they contain estimates of 
the parameters a, 8 instead of the parameters, are only estimates of 
the asymptotic variances, and themselves have a statistical error that 
is quite large. 

Only by direct observation of bio-assays of the same drug, with a 
program of repeated experiments so designed as to include all the 
sources of variation involved in the bio-assay as ordinarily made, can 
the actual error be evaluated. Very few experiments meeting these re- 
quirements have been performed, but such investigations as have been 
made indicate that the real error of the bio-assay is generally consider- 
ably larger than the values given by the formulary estimates [15, 30]. 
Paradoxically, the discrepancy between actual error and the formulary 
values generally increases with the sample size, because while the 
formulas indicate decreasing error with increasing n, many of the ex- 
perimental errors are not reduced with increase in the number of 
animals in the individual experiment, but only with increase of the 
number of independent experiments [5, 16]. 

These remarks are made in order to serve as a warning that the 
formulary calculated errors, sometimes referred to as “internal es- 
timates,” must be used with great caution, and that no conclusions 
regarding comparative assays that depend on the assumption that 
these estimates are measures of the actual error are reliable unless this 
assumption is checked independently by experiment. The formulary 
estimates are nonetheless frequently useful, even essential, serving as 
a minimum baseline from which calculations can be made, and their 
evaluation is therefore illustrated here for the examples presented. 

Following are four examples illustrating the use of the minimum 
logit X° method, in different situations such as are met in practice. 
The discussion of various points which accompanies the examples is to 
be read as part of the definition of the minimum logit X’ method as here 
advanced. Following the examples are three appendix notes, which also 
are to be read as a definitive part of the present essay. 


EXAMPLE 1. GENERAL CASE; STRAIGHT LINE TRANSFORM; NUMBER 
OF SIGNIFICANT FIGURES; CALCULATION OF X? 


The use of the straight-line equation to represent the logistic func- 
tion in the minimum logit X* method is a particular example of a 





1953 


| by 


bers 
3 of 
s of 
3 of 
hat 


ha 
the 
can 


een 
ler- 
30). 
ary 
the 
ex- 

of 
the 


the 
ns 
nat 
his 
ry 


eir 


1m 
ce. 


are 
lso 


1C- 





ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 575 


method of fitting that is very old. If a function Y =f(x, 4, 0 -- -) is 
to be fitted, where the 6’s represent parameters, this method consists 
of finding some function Y’ of Y which is a linear function of z and 
fitting Y’ against z as a straight line. Practical texts on curve fitting 
formerly used this principle almost exclusively. 

It hardly can be doubted that one of the chief reasons for the wide 
use of this method is the opportunity that it affords for the efficient 
utilization of graphic analysis as an adjunct to algebraic treatment. 
In many situations, there is no more effective simple method of testing 
the fitness of a proposed function to a set of observed data than to 
plot the data and the function and to look at the fit. And if the function 
can be put in such a form that, if the function really fits, the plot will 
be linear, then there is an inestimable practical gain. One is able to 
discern systematic deviations from the function which reflect what may 
be an important departure of the observations from hypothesis, when 
the application of formal statistical tests may fail to do so. At the 
same time, the plotted graph serves as a “control chart” for the identi- 
fication of points that fall off the trend so far that they are to be 
suspected of being out of “statistical control” and to be discarded. Bliss 
and also Finney, though they make good use of available statistical 
tests, also discard points that appear clearly to be far off the trend— 
in my opinion, a sensible and statistically sound practice. Even so ad- 
vanced a mathematical treatise as that of Cramér notes that waves of 
the observations may be discerned in a graphical representation, which 
are not reflected in a significant X’ deviation, and Gaddum [27] makes 
the point specifically in connection with bio-assay. This particular ca- 
pacity to judge fitness by eye, when the function is linear, is generally 
taken for granted, but when reflected upon, appears to be a most re- 
markable psychologic phenomenon, to which, so far as I know, no 
study has been applied. There is no equal capacity to judge by eye 
from a graph whether a function is, say, exponential or parabolic, even 
as there is no instrument comparable to a straight edge that one can 
lay down among the plotted points, to judge whether they fall along 
an exponential curve rather than a parabolic curve, and having laid 
it down, to read on the instrument an estimate of the exponential 
parameters. It is no wonder that in older books of statistics, curve ~ 
fitting is, almost by definition, the use of a straight-line transform and 
the accomplishment of a straight-line fit. A great practical help in the 
use of graphic treatment is the availability of appropriately scaled 
printed graph sheets, and for the more ready use of graphic analysis 
with the logistic function I have had printed two graph sheets for the 











576 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


convenient plotting of the linear logit relation. These are illustrated in 
Figure 1 in connection with Example 1, to follow; the legend attached 
to that figure explains the method of using these sheets.’ 

Table 4 shows the details of calculation for a general case. Estimates 
may be given according to the following rule: Carry the estimated 
standard error of the estimate to two significant figures and the estimate 
itself to the number of decimal places given in the error so written. 

The same principle may be used to govern the number of significant 
figures which should be carried at various points in the computations. 
A sufficient number should be retained throughout, to insure that the 
estimates as finally set down be definitively determined. A sine-qua-non 
of any statistical procedure responsibly advanced for general use is 
that it be so defined that two workers with the same data will arrive 
at the same result, to the degree of precision retained in its final 
promulgation. 

The number of significant figures retained in the computations of the 
following examples has been determined by trial, to insure that the 
modest standards of precision set down be fulfilled. However, no rigid 
rules applying to all cases can be given, and it is possible that some 





7 Logit scaled graph papers have been printed privately from time to time, among which should 
be mentioned the early one of Wilson [43]. 





— 


Fic. 1. Plot of observations and fitted logit line for data of example 1, on 
logit graph sheets. These sheets can be purchased from the Codex Book Com- 
pany, Norwood, Massachusetts. 

In the upper figure (a) is illustrated the use of a sheet on which the abscissa 
is scaled arithmetically. Since the function is related to the logarithm of the dose, 
the logarithms are scaled on this co-ordinate. The practice is followed of first 
expressing the dosages as fractions of the smallest amount used (D’) so that the 
first dose in this scale is always unity and its logarithm zero. The percentage re- 
sponse is scaled on the left ordinate. On the right ordinate is scaled the logit of 
the response; this is useful in various ways, as for instance when plotting the 
fitted line 7 =7.06z —1.91, the values of 7 can be located directly without transpo- 
sition to percentages. 

In the lower figure (b) the same data are plotted on a sheet which has four 
logarithmic scales, to accommodate the sheet to various ranges of dosage. The 
initial dosage being unity, the smallest over-all scale is used which will accom- 
modate all the dosages in the experiment; in the present case, since the highest 
dosage is 3.92, it is the second scale. The dosages D’ are located directly on this 
scale, and the data plotted accordingly. If only a single range logarithmic scale is 
provided, to cover the widest range found in practice, as in most published papers, 
the advantage of the logarithmic scale is frequently offset by having the entire 
graph compressed to a small fraction of the sheet, thus negating the main purpose 
of securing a good graphic representation. 








nt 
1S. 
he 
on 


ve 
al 


he 
he 
‘id 
ne 


uld 


ur 
‘he 


est 
his 
e is 
ars, 
ire 
ose 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 577 


ame 










=t) 


100 
Ite 


PERCENTAGE (?- 


a  =62t 24 a1 .30 «633 36-39 


HO. 98.404. LOEsmrTumre LOSIONE RULING. SRDIENED OF 4. SERKEON. O1vI8/0N OF BIOMETRY AND HEDICAL STATIONED. GAYS CLINIC, ROCHESTER, wINm. COBEE BOOE CO. HE momweos. MApescHusErTe. 
erry 


) 


100 


+. 





Loarr (!-™ 785) 


PERCENTAGE (?* 





1 2 . a iS 3 2. a . 7 
t n — 1 L Pore wearers | rn nl. 1 | bul 1 L J 
s 2 * 7 ss eo 2 & & 20 ac 40 so 866 «678 80 80 10 
H ‘ 1 ‘ n et Ee sai eee 1 1 ie: Bh Be 














e6°o F OSS? = “G ezo°O F O46°O = °= ss'o * 9O°L=— @ oc°o + 16° —- = @ 




































































cz°0 = “as ¢2¥90°0 = (F) “a ize = **az8 
3 d 
2 £20°0 = “8 S1ZS000"O = [%%(F — *2) + »%8] ss = “aie = "0 
= 0e°0 = *8 0680°O = %%F+ +e = %e 
ee s(F¥ — zug 
= 88°0 = % ae Ss 
= I 
o>) 
+8°20°0 = “ns = 18 
a T 
= cé6es"b = “CW 889°0 = OSIF'0+ 8692'°0 = 'g 30, + “x = "GQ FO] 8692°0 = 2 = oe 
- 1906°I —22090°4 = 2 . 
ni Zz s(F — z)nuz 
*T— = —————— = D id = = 
4 I~ © Soo me 000° = Go ag-paug * 
=] 6LE1°6 = ‘Bid = (F— z)(1— aug Zh6S°T = “BIC = s(F — z)mUg 
© : nu RZ . nu Z 
- 8ST = Say Znu Zz Zz So9t's = 3(zmu 7%) 
© L28E°OI = ZjNU gy LYS’) = srmug 
ay ’ nu Zz : nu RZ 
< StIl'O= ug! 8682'°0 = zou g = * 
_ 
o cose" = UR 090° IT = rmu zg S0L'8E = NU, 
° "€ BGs wo pm pus m x 
NM 
“ b°l=sX¥ 
4 (pe8E'I 
S 
a SZbPr'O 8¢80°0 618Z°Z+ | $266°I+ oer’ 0029" OT 008Z°¢ 088°0 og 44 £6¢9°0 26°E 2°01 
> 0€28°0 OLET*O SOtP° I+ | 9062°T+ S628 °% 9092 OI ¥200°9 24g8°0 6h op ILP°0 96°S pig | 
< 6£S0°0 2+00°0 99ST°O+ | T880°0+ ese’ OZ10°T OLLP IT 2zS°O oF % 262°0 96°T I‘¢ 
e 6620°0 8200°O TZtZ°O— | 9469°0— PPL I y90F'*L — 8099 ° OT ese'o 8h 9I FOTO OFT se 
7, Tov0'O0 9200°0 TS06°I— | ¥266°I— 0 00ZS°OI— 0082 °S O0ZT*O o¢ 9 000°0 00°T 9°2 
< 
: - |: 
= = amu mu nu : ‘ou 4 z id 4941] / "Bur 
8 si— pau | sl-1) LU : . - wor mo | a | ,@307 | 'G/q | ‘ton 
a -10d0ig -81}U9000D 
sX JO UOT}EMOTBH 9x ‘Q ‘D JO UOWEMOBD 














578 


(9% ‘d [22] AUNNIA WOUA) ANONALOU 20 ALIOIXOL sasoda 
SNOINVA LV TVO0H LON 8” ‘NOILUOdOUd LNVLSNOO NI LON SasOd ‘aSVO IVUSANAD 


1 @IdNVXd—?F ATAVL 





1953 

& « 
5 8 
i] i) 
" K 
fo¢ 

= 

N 

© 

S 
; § 

So > 
1 § 

— ° 

- 1 

; 

SN 

+ 2 

a ~ 

2 

—- % 
is 3 

" 2 
2. & 

- 

= 

. 

l 

" 

x 


25 


Deo = 4.84 + O. 


+ 0.023 


.270 


= O 


Tse 


b=— 7.06 + 0.88 


a= — 1.91 + 0.30 





ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 579 


problems might be encountered in which the number of significant 
figures used in the examples would not be sufficient, while in others a 
smaller number would have sufficed. 

Rules for retaining significant figures differ in different laboratories. 
In routine calculations I maintain for intermediate calculations 6 
decimal places, except where more are necessary to have a minimum of 
6 significant figures. The rule is followed “blindly” without adjustment 
for consistency at the various steps of the interim calculations. Adjust- 
ment is made in the final statement of results, in accordance with the 
principle stated above of retaining two significant figures in the es- 
timated standard error, and decimal consistency with this so far as the 
estimates are concerned. I have found this procedure much more satis- 
factory than attempting to adjust the number of figures specifically 
for the various particular steps of the computation. In the examples, 
the rules just described have been followed, except that a minimum of 
4 decimal or significant figures, rather than 6, has been maintained. 


Calculation of X’ 
The Pearson X’ is given by 


X’*= >) 
In the present situation 
n=number of animals at z 
r=number of animals at z affected 
s=n—r, number of animals at z, not affected 
p=r/n, proportion of animals at z affected 
q=1-—-p=s/n, proportion of animals at z, not affected 
p= “expected” value of p at z, obtained by inserting the estimates 
a, b, for a, B in the logistic function (1). 
g=1— , “expected” value of g at z 
pn= “expected” number of animals at z affected 
gn = “expected” number of animals at z not affected. 


2 fr — bn)? (¢ = Gn)? 
= | pn * qn | (18) 


The X’ can be evaluated directly from (13). However, it is easily shown 
that (13) is identically equal to 


X*= > 


(observed — expected)? 





12 
expected (12) 


Hence 





n 


bq 


(p — p)?. (14) 














580 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Since (14) is easier to compute than (13), a direct calculation of X° can 
appropriately employ (14). Applied to the present example, Table 5 
shows this calculation. 

What has been calculated is the X’ directly as defined. However, the 
logit X° is a very close approximation of the Pearson X’ and therefore 
X’ can be obtained by calculating 


X*(logit) = > npg(l — i)? = DY nwil — I)?. (15) 


Since for (15) we do not need to compute # from / or evaluate n/ 4, 
the replacement quantity npg=nw having already been calculated in 
computing the estimates, the evaluation of X’ from the approximate 
formula (15) is considerably easier than the computation from (14). 
The calculation of X’ by this method is shown in Table 4, where it is 
incorporated in the calculation of the estimates themselves. As is seen, 
the value for X° computed in this way (1.39) is very close to that ob- 
tained by direct computation (1.40). 


TABLE 5—EXAMPLE 1 (cont.) 
DIRECT CALCULATION OF X? 








7 ~ ~n n - 
z n p l p* (p—p)? Pq =. a (p—p)? 








0.000 50 | 0.120 |—1.9051 0.12953 9.00009 0.11275 443.5 0.0399 
0.164 48 0.333 | —0.7471 0.32145 0.00013 0.21812 220.1 0.0286 
0.292 46 0.522 0.1566 0.53907 0.00029 0.24847 185.1 0.0537 
0.471 49 0.857 1.4205 0.80542 0.00266 0.15672 312.7 0.8318 
0.593 50 | 0.880 2.2819 0.90737 0.00075 0.08405 594.9 0.4462 





























1.4002 





X? =1.4. 


* Obtained from antilogit Table 2 by linear interpolation. The value of p to the accuracy given in 
the table can be obtained for the logit with two additional decimal places, by linear interpolation, over 
the entire range of the table. 


EXAMPLE 2. EQUAL NUMBERS AT ALL DOSES; DOSAGE CONCEN- 
TRATIONS IN CONSTANT PROPORTION; ZERO OR 100 PER CENT 
OBSERVATIONS 


If n, the number of animals used, is the same at each concentration, 
it can be seen from the normal equations (4) (5) that n can be elim- 
inated, and we may consider, for the purpose of estimation, that the 
value is unity at each dose, thus simplifying the calculations. 

In many bio-assay experiments, the dosages are made up by suc- 








R 1953 


can 
ble 5 


*, the 
efore 


(15) 
r/ BG, 


ad in 
mate 
(14). 
it is 
seen, 
t ob- 





sion, 
lim- 
the 


suc- 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 581 


cessive dilutions in the same proportion, so that the ratio of the con- 
centration of each dose to the next smaller dose is constant. If the 
logistic function (1) is considered to hold in relation to the logarithm 
of the dose, rather than to the dose itself, then in the logarithmic meas- 
ure the doses will increase arithmetically. In such situations it is 
possible to “code” the dosage simply, in successive equally spaced in- 
tegers, a possibility that not only facilitates computation but increases 
its accuracy, since the logarithm itself will ordinarily be used only to 
a small number of decimal places and therefore will not always be pre- 
cisely correct. Suppose the constant of proportionality is k, and the 
lowest dose is symbolized D,; then the successive concentrations will 








be Di, kD,, k?D, - ++, k°D,, and their logarithms will be log D,, (log 
D,+log k), (log Di+2 log k), ---, (log Di+s log k). If we code z as 
log D — log D, D 
t= = log: ? 
log k D, 


the successive values of x will be 0, 1, 2, - - - , s, which is the coding to 
be used. Even if not all, but most of the dosages, progress in some con- 
stant proportion, it is still desirable to use such a coding, because of the 
greater precision of the resulting calculations. The example (Table 6) 
will illustrate the use of this facilitation in computation. 

The example is taken from a study by Irwin and Cheeseman [30] 
which employed seven doses, but I have not used the observation of 
100 per cent at the seventh dose, which followed an observation of 100 
per cent at the sixth dose. The omission of the final observation of 100 
per cent mortality may be deemed arbitrary, but it is based upon the 
following considerations. 

The model used, according to which the “true” P’s follow the logistic 
function, can be only approximately correct, as has been remarked 
previously, but for many situations it may be considered precise 
enough to serve as the working basis for obtaining the required es- 
timates. This is sound, however, only as a general appraisal; in some 
respects the model is more unrealistic than in others. In one respect it 
is obviously false: According to the logistic model (1), it is necessary 
to have an infinitely large dose for P = 100 per cent and a dose of zero 
(log dose = — ©) for P=0; this is unrealistic, of course. We are dealing 
with all-or-none response in animals. It is characteristic in pharmaco- 
logic experience that one must increase the dosage to some specified 
dose before any animal will show the “all” response, and that beyond 
a certain dosage all animals will show this response. Actually, a dosage 
less than that corresponding to the E.D. 50 by a relatively small 











582 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


amount may be quite innocuous, while the dosage above which all ani- 
mals respond may be larger than the E.D. 50 by only a small amount, 
and is never an inordinately large amount. So far is this true in many 
cases, that the potency is often defined as the “minimum lethal dose,” 
this being the dose below which none, and above which all, animals die. 

This sort of standardization is by no means always as foolish as on 
occasions it has been made out to be, as a personal experience taught 
me. Working with certain drugs, I found that the slope of the dosage 
mortality curve was so high that the zero and 100 per cent points (or 
their close approximations) covered a range of dosage so small that ex- 
periments could not be controlled effectively with dosage varied within 
that range. In a practical sense, the dose below which all true P’s were 
zero and above which all true P’s were 100 per cent was so narrow 


TABLE 6—EXAMPLE 2 


DOSES IN CONSTANT PROPORTION. EQUAL n (50) FOR ALL 
DOSES TOXIN BACTERIUM TYPHI MURIUM 
(FROM IRWIN AND CHEESEMAN ([30])* 






































Propor- 
Dose, Deaths tion 
mg. z= died wt wlt wr 
D Tr 1ea 
r/n=p 
0.0625 0 6 0.120 0.1056 —0.2104 0 
0.125 1 7 0.140 0.1204 —0.2186 0.1204 
0.25 2 33 0.660 0.2244 +0.1488 0.4488 
0.5 3 39 0.780 0.1716 +0.2172 0.5148 
1.0 4 45 0.900 0.0900 +0.1977 0.3600 
2.0 5 50 0.990 0.0099 +0.0455 0.0495 
* Total experiment omitting observation at D =4.00. 
Tt From table 3. 
> w = 0.7219 Dwr = 1.4935 D wl = 0.1802 
z= 2% _ 9 0688 j= 2” _ 0.2496 
Zw zw 
2 wr? = 4.2499 > wal = 1.7489 
(2 wz)?/S w = 3.0898 2 wrt wl/Zw = 0.3728 
Z w(x — 2)? = 1.1601 Z w(z — #)(l — 1) = 1.3761 
g = 202 — DG) 1 1089 oo 2H — Ver . - 2.200 
> w(x — 2)? zw 
to = -+ = 1.8584 log Dse = 2x0 log 2 +log Di = 1.3553 Dee = 0.2266 
1 
8% = 9 = 0.02770 
1 
3% = ne w(x — 8)! = 0.01724 35 = 0.13 
8%g = 8% + 228% = 0.1015 8 = 0.32 
a,,, = is [s%ar + (aso — 2)2s%] = 0.02023 8z,, = 0.14 
a = — 2.20 + 0.32 ab = 1.19 + 0.13 ‘gee = 1.86 + 0.14 


t=1.19z — 2.20 








ani- 


ount, 
nany 
ose,” 
8 die. 
4S On 
ught 
sage 
s (or 
t ex- 
ithin 
were 
TOW 


_ 


Vo © CO 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 583 


that it could be considered infinitesimal. To consider all observations 
of 100 per cent as random samples from true P’s which are never 100 
per cent, as statistical theory of quantal response does, is quite in 
contradiction of the known biologic facts. Where theory and fact con- 
trast so violently, it is foolhardy to press the theory very hard. It is the 
considered opinion of the present writer that observations with zero or 
100 per cent should not be used at all—that when they occur, another 
experiment should be performed with different dosages at values where 
observations of zero or 100 per cent are very unlikely—but this is 
probably an extreme position that will not be generally acceptable. 
I may point out, however, that in the widely advocated Karber [18, 
30] method of estimate of L.D. 50, if at two successive doses on observa- 
tion of zero per cent mortality is made, only the larger of the two doses 
is used, and similarly only the smallest of several consecutive doses 
which show 100 per cent mortality is considered, in making the calcu- 
lation. There seems to be something unreasonable in never using cer- 


_ tain observations in one good method of estimation and always using 


them in another. My suggestion, then, is to use at most one such 
observation at each extremity. 

I am, as just explained, disposed against using observations of zero 
or 100 per cent response in estimation in bio-assay. However, such ob- 
servations may occur, and provision must be made for utilizing them. 
For an observation p=0 or p=1, the corresponding logit is infinite, 
the weight pq is zero, while the weighted logit pgl approaches the limit 
zero as p—0, or p—1. If with these observations, we use for pql the 
limiting value zero (a dubious procedure mathematically), the observa- 
tions are effectively eliminated and hence actually are not used. One 
method for dealing with these observations is similar to that used in 
the iterative procedures of the probit method. A preliminary estimate 
may be made using all the observations except those of zero or 100 per 
cent. The value of p “predicted” by this fit at the values of x corre- 
sponding to observations of zero or 100 per cent is used to replace the 
observation, and a minimum logit X’ estimate is made using this work- 
ing observation. The use of this method mars the elementary simplicity 
of the minimum logit X’ estimate, when a zero or 100 per cent observa- 
tion occurs, to the degree that it requires what amounis to one itera- 
tion. It should be noted, however, that only one “iteration” is required, 
and that the result is a definitive solution, which is quite a different 
situation from the one obtaining in the maximum likelihood probit pro- 
cedures, where an undefined number of iterations is required and where 
the solution is not definitive, if a graphic fit by eye was used for the 
original estimate. 








584 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


Another method of dealing with the zero or 100 per cent observation 
is to employ an old “empirical” rule, according to which one uses for 
zero a working observation 1/2n, and for 100 per cent a working ob- 
servation (2n—1)/2n. In order to determine the relative merits of the 
two methods proposed, I performed some sampling experiments similat- 
ing a situation in bio-assay when zero and 100 per cent would be rela- 
tively frequent. In all cases I found that the error of estimate when the 
“empirical” rule was used was smaller than that obtained when the 
“iterative” scheme was used. Thus in the present situation the easier 
method is also the more precise, a situation which is similar to one 
found in comparing the minimum logit X’ estimate with the maximum 
likelihood estimate. The definitive procedure now for dealing with 
zero’s and 100 per cent observations in the minimum logit X’ method is 
therefore to use the rule of substituting 1/2n for zero and (2n—1)/2n 
for 100 per cent observations. 


EXAMPLE 3. COMPARATIVE ASSAYS; PARALLEL LINE PROCEDURE 


The following is an example in which we wish to estimate the toxicity 
or “potency” of a drug T’,, to be tested in terms of another drug S (the 
“standard”); that is, we wish to answer the question, “How much 
more (or less) potent is drug 7 than drug S?” The answer to this question 
will depend on definition, but one reasonable answer is based on the 
following rationale: Suppose that with each drug there is a definite 
relationship between dosage and percentage response, that is, that the 
greater the dosage the greater the response, but that these relationships 
are different for the two drugs so that for a given concentration the 
percentage response using T' is different from the response using S. If 
now we consider some definite percentage response and find that it 
requires a concentration Cy of the standard to produce say a 50 per 
cent effect, while it requires only one-third the concentration of drug 
T to produce this effect, we may reasonably say that 7’ is three times 
as potent as S. Suppose, however, that it is found that for a 75 per cent 
response using S it requires not one-third, but only one-fourth the con- 
centration of 7, then we should have to say that for this response level, 
drug T is four times as potent as S. This will pose the dilemma that 
there is no unequivocal answer to the question of how much more 
potent is drug 7 than drug S. We shall have to say, “Drug T is three 
times as potent as S for an effect of 50 per cent and four times as potent 
as S for a 75 per cent effect.” 

Let us turn to the logistic function and put the problem in terms of 
the logit representation. If the logistic relation of response is to the log 





ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 585 


dose, the logit plotted against log dose will be a straight line. The 
horizontal distance between the two lines representing 7’ and S respec- 
tively, at any value of the response, is the difference of the logarithms 
of the doses of 7’ and S which produce that response, and thisis the 
logarithm of the ratio of the doses themselves. If the logit lines are 
parallel, the horizontal distance will be the same at all values of re- 
sponse, and the measure of the relative potency as the ratio of the 
dosage concentrations which produce the same response will be 
unequivocal, so far as response level is concerned. If they are not 
parallel, the relative potency will depend on the response level to which 
it is referred, and it seems reasonable under these circumstances to 
refer to the 50 per cent response point as a standard convention. 

It is a moot question whether the dose response curves of different 
drugs are in fact parallel in the sense referred to, that is, whether the 
ratio of dosage concentrations for equal response is the same for all 
response levels, in general, for most drugs, or ever. Too little investiga- 
tion with large enough numbers of animals has been performed in 
respect of this question, to provide a reliable general answer. To attempt 
to ascertain whether the lines are parallel, by applying statistical sig- 
nificance tests ad hoc to individual samples at hand, is futile, for with 
the numbers in the samples generally employed, the power of existing 
tests is so small that even if the lines are really far from parallel, the 
probability of a significant test result is very small. Certainly there is a 
good deal of evidence that in many cases the response curves of sim- 
ilarly effective drugs with the same animals are not far different from 
parallel, within the 10 per cent to 90 per cent response levels. This is 
perhaps sufficient to justify the general practice of making relative 
assays by means of “parallel assays,” the procedures for which will be 
presently described and illustrated. However, another view is permissi- 
ble. If we do not know or do not have good evidence that the dose 
response curves are parallel, it seems reasonable to make the assay by a 
statistical procedure which would be valid whether the lines are parallel 
or not. If the lines are in fact parallel, the ratio of the E.D. 50’s of the 
two lines separately estimated is an estimate of the distance between 
the parallel lines only slightly less good (larger variance) than the 
estimate obtained by fitting the two lines on the assumption of paral- 
lelism. If the lines are in fact not parallel, then we should not fit parallel 
lines. Since the ratio of the concentrations at the 50 per cent response 
point is a good estimate for either assumption, and since it is very dubi- 
ous that response curves are in fact ever actually parallel in 4 literal 
sense, it would seem reasonable that we use the ratio of the E.D. 50’s 





586 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


and not parallel assays. Occam’s razor, which requires the application 
of a minimum of assumptions, is still one of the soundest principles 
ever enunciated on which to base scientific procedure. To use the 50 
per cent point as the conventional level for comparative assays, even 
in the face of the possibility that the lines are not parallel, is arbitrary, 
but no more so than using the E.D. 50 as the index of potency for an 
individual drug. Two drugs equivalent at the 50 per cent point may 
not be equivalent at other response levels; yet this has not prevented 
the development of a vast statistical literature on the calculation of 
the E.D. 50 in which it is implied that this point is an acceptable con- 
ventional level for the measure of potency. 

In the following example the relative potency of two drugs is es- 
timated by fitting parallel lines, in order to illustrate the computa- 
tional steps for such a procedure, but it should not be taken to imply 
that I am advocating this as obligatory in estimating relative potency. 

The basic principle of estimating two parallel logit lines is the same 
as that for a single line, that is, we minimize the logit X*. However, 
since there are two logit lines, 7, and 7,, to be fitted, and since the as- 
sumption of parallelism implies that the 8 parameters are the same for 
both lines, there will be three parameters to estimate, a,, a,, and 8, the 
estimates being represented respectively as a;, a,, and b. We shall 
minimize the total logit X’, 


X*(logit) = > npg(l — 1)? + YS npg(l — 1,)2. ‘16) 


The normal equations are 


dX npq(! — ,) = 0, 
> npq(l — i.) = 0, 
> npgz(l — i) + SS npgr(l — 1.) = SY npgr(l — i) = 0, 


where 

1,=a,+bz, 

1 = a,+bz, 
and the other symbols are defined as previously. 

The distance between the fitted parallel lines symbolized M is given 
by M=(a,—a,)/b; the ratio R of the potency of the test drug to 
standard is given by log R= M. 

Table 7 gives the details of calculation. 


EXAMPLE 4. RELATIVE POTENCY “4-POINT PARALLEL ASSAY” 


This is a widely used scheme for estimating the unknown potency of 
a drug T to be tested relative to a standard S, by making only two 








ren 
to 


of 


wo 


ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 587 


TABLE 7—EXAMPLE 3 


PARALLEL LINE COMPARATIVE ASSAY. TOXICITY OF 
ROTENONE RELATIVE TO DEGUELIN 
(From Finney [22], pp. 68, 69) 















































ROTENONE 
Concen- Propor- 
tration, | Log D | Deaths Total tion ont m 
mg/liter x r 7 died nwl —_ 
D — r/n=p 
2.6 0.415 6 50 0.120 5.2800 | —10.5200 | 2.1912 
3.8 0.580 16 48 0.333 | 10.6608 | — 7.4064 | 6.1833 
5.1 0.708 24 46 0.522 | 11.4770 1.0120 | 8.1257 
7.7 0.886 42 49 0.857 6.0074 10.7506 | 5.3226 
10.2 1.009 44 50 0.880 5.2800 10.5200 | 5.3275 
* w and wi from Table 3. 
= nw = 38.7052 Z nwz = 27.1503 Z nwl = 4.3562 
2 = 2 RVE _ 9.7015 to 29% 20.1195 
= nw = - 
= nwz? = 20.3399 nwlz = 12.1947 
(= nwz)?/E nw = 19.0450 (= nul) (Z avait nwo = 3.0557 
z= nu(z — 2)? = 1.2949 = nwll —D)(c¢& — 2%) = 9.1390 
DEGUELIN 
Concen- Propor- 
tration, | Log D | Deaths Total tion 1 
mg./liter z r —_ died ~ - — 
D a r/n=p 





1.004 18 48 0.375 | 11.2512 | —5.7456 | 11.2962 
1.305 34 48 0.708 9.9216 8.7888 | 12.9477 
1.481-| 47 49 0.959 1.9257 6.0711 | 2.8520 
1.606 47 .| 50 0.940 2.8200 7.7600 | 4.5289 
1.703 48 48 0.990 0.4752 2.1840 | 0.8093 




















s8e35 
or im Go bo 


























nw = 26.3937 D nwz = 32.4341 Zz nul = 19.0583 
> ‘ 
2 = Ue 1.2289 j = Zl _ 9.7001 
z nw = nw 
> nwz? = 41.1136 = nwlz = 30.8740 
(2 nwz)?/E nw = 39.8569 (2 nwl)(Z nwz)/Z nw = 23.4199 
Deguelin 2 nw(x — 2)? = 1.2567 = nw(l -D(z —#) = 7.4541 
Rotenone 2 nw(z — #)? = 1.2949 ZT nwo(l —l)(x — 2) = 9.1390 
BB gate — 2)? = 2.5516 TZ nw(l — I(x — 2) = 16.5931 
Zz nw(l — l)(z — 2) Zz nul — b= nwz 
= = 6.5030 = —————— = — 4.4491 = — 7.2692 
. ZZ nw(z — 2)? a = nw ” 





588 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
TABLE 7 (cont.) 
M =logR = SS = 0.4337 R = 2.715 


= 0.02584 


1 
38.7052 


1 
33, = 3%, = 
* = nw al 


so", = = 0.03789 


1 
26.3937 
Dr nw(x — Zz)? 2.5516 


sy = in [s%a, + 8%" + 8%(2a — fr — M)?] 


1 
™ 92.2890 


= 0.3919 





8% 


[0.02584 + 0.03789 + 0.3919(1.2289 — 0.7015 — 0.4337)2] = 0.001588 


RS, 
8M = SlogR = 0.040 8R = ee = 0.25 


R = 2.71 + 0.25 b = 6.50 + 0.63 


observations with each of the drugs. The known standard is used in 
concentrations, say D,; and D.=kD,; the unknown is diluted in the 
same proportion as the standard concentration. If the lower and higher 
concentrations of each of the drugs are coded respectively x=0 and 
x=1, the ratio R of the potency of the unknown to standard is given 
by log R=(ar—as)/b log k, where k is the ratio of the larger to the 
smaller dosage of the standard. The fact that the values of z are either 
zero or unity simplifies the summations required, so that a special 
format for the calculations is worth while, which is illustrated in 
Table 8. 


APPENDIX NOTE 1. DIFFICULTIES IN PRACTICE WITH ITERATIVE 
METHODS 


The fact that an iterative procedure is needed for the probit solution 
with maximum likelihood results in a number of practical disadvan- 
tages. In the first place, a formidable amount of rather involved com- 
putation is required, and for this reason alone a number of protests 
have been issued against the method and alternative simpler methods 
have been proposed in order to alleviate the computational labor [14, 
32, 33, 35, 36, 41]. But equally or more important is a consequent 
lack of precision of the estimates as achieved in practice—and this 
appears not to have been sufficiently considered. 

In general practice, where only one cycle of iteration is used, the 
method does not yield a definitive solution, so that two workers using: 
the same data will not necessarily obtain the same estimate. To insure 
definitiveness, strict rules must be laid down, to continue the iterations 





ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 


TABLE 8—EXAMPLE 4 


DECAMETHONIUM BROMIDE, KNOWN CONCENTRATION AND 
UNKNOWN TO BE TESTED EXPERIMENT OF DEWS AND 
BERKSON, 24 ANIMALS AT EACH DOSAGE 








Dose, , 
mg. /ml. Designa- 


ti w* wl* 
D 10n 





D 0.250 | 0.1875 | —0.2060 
Tested 
1.5 D 0.667 | 0.2221 0.1543 





0.016 7 0.292 | 0.2067 | —0.1831 
Standard 
0.024 21 0.875 | 0.1094 0.2128 


























* From Table 3. 
Dwr = wi + ws = 0.4096 Ll wrl = wil: + wd: = — 0.0517 Dwrr = ws = 0.2221 
Dws = ws +wm = 0.3161 ZL wsgl = wails + wid = 0.0297 ZL wsr = wm = 0.1094 
LT wlr = wills + wd = 0.3671 
2 wit wz — (wih + wls)ws (wile + wide 
rw wi + ws ws + ws 
= zr wl 
Sut ~he ~@ «Bae ~ — Se oem 
Zw 
(= wz)? ws? ws? 
ss pw “ata -— Ts “ascn 
Zwil —Y(zx —2) 0.3848 
‘cas ae 8 
oo ZL wl —br wr avi Zwrl—-b2wrz _ —0.0517 — 2.2217(0.2221) 
> w = wT 0.4096 
Zwsl — bX wsz _ 0.0297 — 2.2217(0.1094) 
2 ws 0.3161 


= — 0.01775 








2 w(x — 2)? = 2 wr? — 








= — 1.3309 


as = = — 0.6750 





aT —as 
M = : = — 0.2952 log R = M log 1.5 = — 0.05198 R = 0.8872 


’ 1 = 1 1 
7 Se Sg See = 
niw niwr 24(0.4096) 
1 = 1 
nDw(z —Z)? 24(0.1732) 
Zwre _ 0.2221 Zwsz 0.1094 
"ta tan 0 FL, Ce 








= 0.1017 8%,’ 


1 1 
“se "=3., “Maa “9 





3% = = 0.2406 sp = 0.49 


= 0.3461 


1 
aires [s%s’, + 8%a’g + 8%(2s — 27 — M)?] 


= Tones [0.1017 + 0.1318 + 0.2406(0.009821)] = 0.04778 


Rover = 0.07865 
e 


sm = 0.2186 Slog R ~ 8m log 1.5 = 0.03850 2° 


R = 0.887 + 0.079 b = 2.22 + 0.49 





590 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


until constancy is attained in the estimates, to a specified number of 
significant figures [1, 29]. Probit solutions which have been published, 
even in authoritative texts advancing the method, and in important 
pharmacologic standardizations which employ the probit method, fre- 
quently cannot be checked arithmetically as maximum likelihood es- 
timates, even to the number of decimal places to which they are carried 
in final] publication. Different estimates applying to the same set of 
data have been published by different authors, and even by the same 
author [12, 19, 34], the differences reflecting the residual of lack of defi- 
nition in the original graphic solution, not eliminated because an in- 
sufficient number of iterative cycles have been accomplished. 

If definitiveness of the estimates is to be achieved, even the decision 
as to what tables should be employed and the manner of their use is 
something of a problem. For the example illustrative of the probit 
method used by Fisher and Yates [26], there are given two sets of 
estimates. One, obtained with a single cycle of iteration, is b = .68906, 
L.D. 50=6.618; regarding the other, the authors say that “a much 
more precise fit gives 6.609 . . . for the 50 per cent point . . . the more 
exact value (for the slope) is .7126.” No information is proferred as to 
the number of cycles of iteration required for the “more exact” es- 
timates, nor is a precise description given of how the tables of the vol- 
ume in which the example is incorporated were used.® It is a novel sta- 
tistical doctrine that is reflected in this example, where it appears that 
there are two kinds of maximum likelihood estimates, one for ordinary 
everyday use and one for statistical Sundays when we use “more 
exact” estimates. Had the authors given b =.7, L.D. 50 =6.6, as the first 
estimates, and b =.7126, L.D. 50 =6.609, as the second, both sets could 
intelligibly be regarded as correct maximum likelihood estimates, the 
last more precisely determined than the first. But in respect to the 
estimates b = .68906, L.D. 50 =6.618, and b =.7126, L.D. 50 =6.609, how 
can both sets be correct maximum likelihood estimates? 

For the example in Finney’s text [24], used to illustrate the computa- 
tions of probit analysis, he gives b =4.176 + 0.466, and then says, “The 
slope has been altered from its provisional value of 4.01 by an amount 
equal to about one-third of its standard error, and if an accurate value 
of b were particularly required (sic)® a further cycle of computations 
would be desirable; the next value obtained for b is, in fact, 4.196, 
the alteration being only 4% of the standard error.” The value given, 





8 Garwood [29], working with these same data, obtained a sligh«!y different value for b (.7128) and 
stated that he found it necessary to use Pearson’s tables [38] to obtain the precise maximum likelihood 
estimates. 

® The value already calculated involved computations with the use of 7 significant figures. 





2% 1953 


er of 
hed, 
tant 
-fre- 
l es- 
Tied 
t of 
ame 
Jefi- 


| in- 


sion 
e is 
obit 
3 of 
106, 
uch 
ore 
3 to 
es- 
vol- 
sta- 
hat 
ary 
ore 
irst 
uld 
the 
the 
ow 


ta- 


ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 591 


4.196, doubtless is obtainable, if the tables in Finney’s text are used in 
some particular way, but I have not been able to reproduce it exactly, 
and when I used the W.P.A. tables of the normal deviate [39], employ- 
ing methods which appear to me to be correct, iteration toward a 
maximum likelihood estimate yielded a value for b which is smaller, 
not larger, than 4.176. 

The probit maximum likelihood method need not have been devel- 
oped along lines which have led to the present chaotic situation, in 
which published estimates in official standardizations are not definitive 
and cannot be checked for the data on which they are based. In the 
article in which the mathematical development of the maximum likeli- 
hood estimate of the probit equation was first set forth, Irwin and 
Cheeseman [31] obtained the “first approximation,” not by a graphical 
fit accomplished by eye, but by using the observed probits, and 
weights z?/pq obtained from the observations. Had the procedure ad- 
vanced by Irwin and Cheeseman been adopted as standard, then a 
notation with a published estimate, indicating the number of iterative 
cycles which had been accomplished, would have rendered the estimate 
reproducible, even if or some occasion it still might be criticized as 
not being the maximum likelihood estimate, because an insufficient 
number of iterative cycles had been employed. However, the procedure 
of Irwin and Cheeseman was abandoned in favor of the one employing 
a graphic fit by eye for the first approximation. 

Referring to the calculations of the probit estimates in ten repeated 
experiments, by three iterative methods, Method III being the maxi- 
mum likelihood estimate, Irwin and Cheeseman say [31], “Starting with 
the probits corresponding to the observed mortalities... the con- 
vergence is not very rapid. About 6 successive approximations are 
needed to get accuracy to 2 significant figures. .. . Sample D needed 8, 
11, and 9 approximations respectively for Methods I, II, and III.” 

The procedure of “probit analysis” as widely advanced and prac- 
ticed, consisting of a single cycle of iteration based on a provisional 
graphical estimate, actually is not a maximum likelihood estimate, but 
only a somewhat modified graphical solution. Since it is a step in the 
right direction toward the maximum likelihood estimate, perhaps it is 
entitled to the designation “likelihood estimate.” If one or two more 
iterations are performed, it could be called a “very likelihood estimate” ; 
if as many as 9 iterations are accomplished, as in the example from 
Irwin and Cheeseman referred to above, an “exceedingly likely likeli- 
hood estimate,” and so forth. A really mathematical maximum likeli- 
hood estimate in the present circumstance is rarely attainable, but this 





592 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


estimate appears to be held so noble an objective that perheps we 
should be contented only to aspire to achieve it. However, it must be 
remembered that it is solely to the actual maximum likelihood es. 
timate that the optimum properties pertain, which Finney insistently 
claims for “probit analysis.” These optimum properties do not refer 
to a “likelihood estimate,” nor even to a “practically good enough” 
maximum likelihood estimate. 


APPENDIX NOTE 2. OTHER ESTIMATES OF THE LOGISTIC PARAMETERS; 
THE MAXIMUM LIKELIHOOD ESTIMATE 


In the investigations of the present author, it has been found that 
there is at least one other estimate of the parameters a, 8 of the logistic 
function which has smaller sampling error (mean square error) than 
the minimum logit X’ estimate. This is the “Blackwellized” minimum 
logit X’ estimate. The Blackwellized estimate is the expectation 
(weighted mean) of the estimates corresponding to the samples of a 
sufficiency group, which is the group of samples for which the sufficient 
statistics [in the present case (Znp, Znpz)], have the same value. By 
an extension of Blackwell’s theorem [9] to biased estimates, these 
estimates have a mean square error equal to or smaller than the original 
estimates, and their bias, if any, will be the same as that of the original 
estimates. In the present case, for each of the minimum X’ estimates, 
the m.s.e. is less than, rather than equal to, the m.s.e. of these es- 
timates before Blackwellization. This is a consequence of the fact 
that with the X’ estimates, the estimates from the samples in the suf- 
ficiency group are not identical. The maximum likelihood estimate, on 
the contrary, while it is sufficient, necessarily is identical for all 
samples in the sufficiency group, and therefore is unchanged by Black- 
wellization. 

The Blackwellized estimates have a characteristic in common with 
the maximum likelihood estimate, in that the estimate is the same for 
all samples having the same value for the sufficient statistics; this im- 
plies that for these estimates, the sufficient statistics Znp and Znpzr 
of the sample uniquely determine the estimates. It is therefore possible, 
in principle, to prepare a table with two-way entry, corresponding to 
Znp and Ynpz, which gives the Blackwellized estimates. For the 
Blackwellized minimum logit X’ estimate this would involve many 
calculations and a great deal of arithmetical labor.!° 





10 T have, however, calculated the Blackwellized estimate for several special cases, and confirmed 
by direct computation that it has a smaller m.s.e. than either the maximum likelihood estimate or the 
minimum X? estimates. 





ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 593 


The maximum likelihood estimate has a larger mean square error 
than the minimum logit X’ estimate and therefore cannot be con- 
sidered the best estimate available for the logistic parameters. If there 
is no practical reason to the contrary, such as difficulty of computation, 
it is certainly a sound principle that the best estimate be used. How- 
ever, the maximum likelihood estimate is generally a very good es- 
timate even if it is not always the best, and if there are circumstances 
in which it is easier to obtain than the minimum logit X’ estimate, 
it should not be barred from good statistical practice. The maximum 
likelihood estimate of the logistic function does in fact have some char- 
acteristics which, with necessary preliminary work, make it obtainable 
even more easily than the minimum logit X’ estimate. 

In general, it is possible to compute the maximum likelihood estimate 
of the logistic parameters by iterative procedures using logits, anal- 
ogous with those used for obtaining the maximum likelihood es- 
timate of the parameters of the integrated normal curve, using probits 
[6, 8, 26]. When obtained in this way, the estimates of the logistic pa- 
rameters have the same practically unsatisfactory character as the 
probit or other iteratively obtained estimates—that is, they are not 
accurately or even definitively obtained unless a sufficient number of 
iterative cycles are accomplished to meet a specified degree of pre- 
cision. However, the maximum likelihood estimate of the logistic func- 
tion does not always require such iterative procedures. For instance, in 
the case of three equally spaced doses z with the same number of ani- 
mals at each of these, Wilson and Worcester [45] have provided a 
cubic equation in terms of two easily computed statistics of the obser- 
vations, the explicit solution of which yields the exact maximum 
likelihood estimates. It is not arithmetically easy to obtain the solution 
of this equation, but these authors have presented an approximation 
which can be easily solved, that gives the correct maximum likelihood 
estimate to five or six significant figures! For more than three or per- 
haps four doses, it is impossible to develop explicitly soluble equations 
such as Wilson provided for three doses. However, following is de- 
scribed a scheme which could provide the solution directly, exact to any 
desired number of places, for any specified arrangement of dosage and 
number of animals. 

The normal equations for the maximum likelihood estimates of the 
parameters of the logistic function are 


Li np = dnp 
> npz = > nbz 











594 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


where n is the number of animals, p the observed proportion affected, 
and =1/(1+e-“+)) a and b being the maximum likelihood estimates, 

It may be observed at once that the estimates are unique functions 
of two easily computed statistics, 2np and Znpzx (and if the n’s are 
equal, of Zp and Zpz), which are in fact the sufficient statistics for 
these parameters. Hence, as it has been mentioned is true for the 
Blackwellized estimates, it is possible to prepare a table with the 
two-way entry =p and Ypz (considering the case with equal n) which 
will provide the maximum likelihood estimates. Now, while it appears 
at present that to make the necessary computations for the Black- 
wellized estimates would involve a prohibitive amount of arithmetical 
work, the maximum likelihood estimates can be provided by a nomo- 
gram that is not difficult to construct. Consider a graph sheet on the 
co-ordinates of which are scaled =p and Zpz. For any pair of values a, 
b; which are the maximum likelihood estimates of some a, 8, and with 
a specified arrangement of dosages, the values of Zp and Lpz corre- 
sponding to these estimates are defined and given by 2 and Zz, 
where # corresponds to the logistic function with a =a; and 8 =b,. The 
values of 2, 2px correspond to a point on the graph. Now, if we 
change b but not a, so that the estimates are a, be, another point will 
be located corresponding to the same value of a as the first and a dif- 
ferent value of b. In this way a series of points (2p, Zpz) can be located 
which are the locus of the iso-a; values of the maximum likelihood 
estimates of a; all samples with maximum likelihood estimate a; have 
their (Zp, Zpzx) values located on this line. Hence if the 2p, Lpz of 
any sample fall on this line, a; is the maximum likelihood estimate of a. 
In the same manner the iso-a lines for other values of the estimates of 
a and @ are located, and on the resulting nomogram the estimates can 
be read directly, corresponding to the value of Zp and Zpz of the 
sample; the accuracy of the estimate will be limited only by the pre- 
cision with which the graph can be read, and therefore the scale on 
which the nomogram is constructed. In a similar manner one can con- 
struct a nomogram from which one can read the maximum likelihood 
estimate of 8 or of the E.D. 50. An example of such a nomogram 
giving the estimates of the E.D. 50 corresponding to the situation of 
four equally-spaced doses and equal number of animals at each is given 
in Figure 2. It is planned to construct additional nomograms corre- 
sponding to three, five, six, and seven doses. 

Worcester and Wilson [44] have already provided a table giving the 
maximum likelihood estimates calculated from Wilson’s «quation 
referred to previously, for the case of three equally spaced doses with 





REREEE ES 


VR 


SUERERESESEO REED 


Hi 





og sea RGHERREROTS 


| 1953 


ted, 
tes, 
ions 
are 
for 
the 
the 
Lich 
ars 
ick- 
ical 
mo- 
the 
3h, 
rith 
rTe- 
a, 
The 


will 
dif- 
ted 
0d 
ive 
> of 
Ef a. 
3 of 
an 
the 
re- 
on 
on- 
od 
am 

of 
ren 
Te- 


she 
ion 
ith 


ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 


0.7 0.9 1.4 1.2 1.3 


iA 
x= pa 


Fia. 2. The nomogram in the present figure is for illustration only; for actual 
use it is to be printed on a larger scale. It gives the maximum likelihood estimate 
of y, the E.D. 50, entering with =p and =pz, for an arrangement of four equally- 
spaced doses z scaled —1.5, —0.5, +0.5, +1.5, at which the observed propor- 
tions affected are respectively p:, p2, ps, ps. For calculation, it is convenient to 
scale doses z as 0, 1, 2, 3 and to calculate 2p and Zpzr’=p2.+2p;+3m%; then 
Upr=ZTpxr’—1.5zp. 

The nomogram gives also the estimate of 6 for several widely separated 
values of 8. It would be possible to provide also values of 8 at small intervals, as 
well as a and the asymptotic variances of these, but to read more than one func- 
tion of the statistics on the same nomogram would be confusing; if considered 
worth while, separate nomograms can be drawn for each function which is 
wanted. 











596 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


equal n at each, in terms of the statistics A =2(pi+p2+ps)-s and B 
= P3— Pi, where pi, p2 and p; are the observed proportions respectively, 
at doses scaled —1, 0, +1. It is seen that A=2Zp_3; and B= 2pz, 
so that their table may be viewed as a special case of the general one for 
which the nomograms are being prepared. 


APPENDIX NOTE 3. DEFINITION OF LOGIT 


The use of the straight-line transform, which I baptized as the “logit” 
in 1944 [7], is found in early articles which utilized the logistic function, 
an example being its use by Von Krogh (1916) in illustrating the fit 
of his logistic law of hemolysis [42]. This is not surprising, since the use 
of the straight-line transform method of fitting a curve is very old and 
elementary. The first published table of logits (though of course not by 
that name), so far as I know, is that of Yule [47] (1925). In 1944 I 
published a nomogram [7] from which logits and antilogits can be read, 
in which I inadvertently reversed the sensible convention of attaching 
a sign to the logit by which an increase of logit corresponds to an in- 
crease of antilogit, a gaucherie I later corrected [6]. In March, 1950, 
I issued an extensive table of logits, with weights and working values, 
for the facilitation of fitting the logistic function by (a) maximum likeli- 
hood, (b) minimum (Pearson) X?, or (c) “least squares,” published at 
private expense and distributed widely for trial, but which has been 
withheld from general circulation, awaiting the results of investigation 
of the relative properties of these estimates. 

In 1947 Finney published a table [21] under the title “Transformation 
of Percentages to Logits,” in which are given not the logits / as I de- 
fined them, but instead a quantity 0.51+-5. This alteration is of the 
species effected in the normal deviate of Galton and Shepard [28]., 
when the number 5 was added to it, in order to create “probits”—a 
change the wisdom of which has been widely questioned (for example 
discussion by Fieller, Irwin in reference) [21]. However, that modifica- 
tion was comparatively simple, and the designation of the original 
authors was not used for the substitute quantity. A later publication 
by Finney [20] contains similar, more extensive tabulations, which 
resemble my 1950 tables except in respect to the alteration referred 
to. That alteration destroys the natural mathematical symmetry of 
the logit, makes tabling twice as long as necessary, results in excessive 
difficulty of computation because of the large numbers involved, and 
introduces arbitrary constants into, and therefore confuses, all mathe- 
matical developments involving the logistic function. Whatever are 
the putative merits of Finney’s quantities, they should not be referred 
to as logits. 








iR 1983 


nd B 
ively, 
: Zpr, 
ne for 


ogit” 
tion, 
he fit 
e use 
1 and 
ot by 
144 | 
read, 
ching 
n in- 
1950, 
lues, 
ikeli- 
od at 
been 
ition 


ition 
[ de- 
the 
28}., 
"—9 
nple 
fica- 
rinal 
tion 
hich 
rred 
y of 
sive 
and 
the- 
are 
rred 


idee Aes hints en eeana: ase ee Ne 








ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 597 


REFERENCES 


[1] Armitage, P., and Allen, I., “Methods of estimating the L.D. 50 in quantal 
response data,” Journal of Hygiene, 48 (1950), 298-322. 

[2] Belk, W. P., and Sunderman, F. W., “A survey of the accuracy of chemical 
analyses in clinical laboratories,” American Journal of Clinical Pathology, 17 
(1947), 853-61. 

[3] Berkson, Joseph, “Relative precision of least squares and maximum likeli- 
hood estimates of regression coefficients,” (Abstract), Annals of Mathemati- 
cal Statistics, 23 (1952), 148. This report (Boston, Dec. 29, 1951) was based 
upon calculation of the total distribution. 

[4] Berkson, Joseph, “Relative precision of minimum chi-square and maximum 
likelihood estimates of regression coefficients,” Proceedings of the Second 
Berkeley Symposium on Mathematical Statistics and Probability. Berkeley 
and Los Angeles: Univ. of Calif. Press, 1951. See pp. 471-79. 

[5] Berkson, Joseph, “Are there two regressions?” Journal of the American Sta- 
tistical Association, 45 (1950), 164-80. 

[6] Berkson, Joseph, “Minimum chi-square and maximum likelihood solution in 
terms of a linear transform, with particular reference to bio-assay,” Journal 
of the American Statistical Association, 44 (1949), 273-78. 

[7] Berkson, Joseph, “Application of the logistic function to bio-assay,” Journal 
of the American Statistical Association, 39 (1944), 357-65. 

[8] Berkson, Joseph, Jbid. See footnote p. 363. 

[9] Blackweil, David, “Conditional expectation and unbiased sequential esti- 
mation,” Annals of Mathematical Statistics, 18 (1947), 105-10. 

[10] Bliss, C. I., “The comparison of dosage-mortality data,” Annals of Applied 
Biology, 22 (1935), 307-33. 

[11] Bliss, C. I., “The calculation of the dosage-mortality curve,” Annals of Ap- 
plied Biology, 22 (1935), 134-67. 

[12] Burn, J. H., Finney, D. J., and Goodwin, L. G., Biological Standardization 
(Second edition). London: Oxford Press, 1950. See p. 147. 

[13] Cramér, Harald, Mathematical Methods of Statistics. Princeton, New Jersey: 
Princeton University Press, 1946. 

[14] DeBeer, E. J., “The calculation of biological assay results by graphic meth- 
ods. The all-or-none type of response,” Journal of Pharmacology and Ex- 
perimental Therapeutics, 85 (1945), 1-13. 

[15] Dews, Peter, and Berkson, Joseph, On the Error of Bio-Assay with Quantal 
Response. Presented at Biostatistics Conference of Iowa State College, 
Ames, Iowa, June 16 to July 18, 1952. Copy of manuscript available on re- 
quest. 

[16] Elveback, Lila, An Expository Note on Estimate in the Controlled Experiment. 
Unpublished data. 

[17] Emmens, C. W., “The dose response relation for certain principles of the 
pituitary gland, and of the serum and urine of pregnancy,” Journal of Endo- 
crinology, 2 (1941), 194-225. 

[18] Epstein, Benjamin, and Churchman, C. W., “On the statistics of sensitivity 
data,” Annals of Mathematical Statistics, 15 (1944), 90-96. 

[19] Finney, D. J., “Graphical estimation of relative potency from quantal re- 
sponses,” Journal of Pharmacology and Experimental Therapeutics, 104 
(1952), 440-44. 








598 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


[20] Finney, D. J., Statistical Method in Biological Assay. London: Charles Grif. 
fin & Co., Ltd., 1952. 

[21] Finney, D. J., “The principles of biological assay,” Journal of the Royal Sta- 
tistical Society, 9 (1947), 46-91. 

[22] Finney, D. J., Probit Analysis. A Statistical Treatment of the Sigmoid Re. 
sponse Curve. London: Cambridge University Press, 1947. 

[23] Finney, D. J., Probit Analysis. Ibid. See p. 201. 

[24] Finney, D. J., Probit Analysis. Ibid. See pp. 54-55. 

[25] Fisher, R. A., Contributions to Mathematical Statistics. New York: John 
Wiley & Sons, Inc., 1950. Consult index under “Sufficiency,” “Sufficient sta- 
tistics,” “Information.” 

[26] Fisher, R. A., and Yates, F., Statistical Tables for Biological, Agricultural 
and Medical Research. Seeond edition. London and Edinburgh: Oliver and 
Boyd (1943). See pp. 8-11. 

[27] Gaddum, J. H., Methods of Biological Assay Depending on Quantal Response. 
Medical Research Council, London Special Report Series No. 183 (1933). 
See p. 24. 

[28] Galton, F., “Grades and deviates, with a table of deviates of the normal 
curve by W. F. Shepard,” Biometrika, 5 (1906-1907), 400-06. 

[29] Garwood, F., “The application of maximum likelihood to dosage-mortality 
curves,” Biometrika, 32 (1941), 46-58. 

[30] Irwin, J. O., and Cheeseman, E. A., “On an approximate method of deter- 
mining the median effective dose and its error in the case of a quantal re- 
sponse,” Journal of Hygiene, 39 (1939), 574-80. 

[31] Irwin, J. O., and Cheeseman, E. A., “On the maximum likelihood method of 
determining dosage-response curves and approximations to the median-ef- 
fective dose, in cases of a quantal response,” Journal of the Royal Statistical 
Society, Supplement 6 (1939), 174-85. See p. 182. 

[32] Knudsen, L. F., and Curtis, J. M., “The use of the angular transformation 
in biological assays,” Journal of the American Statistical Association, 42 
(1947), 282-96. 

[33] Litchfield, J. T., Jr., and Fertig, J. W., “On a graphic solution of the dosage- 
effect curve,” Bulletin of the Johns Hopkins Hospital, 69 (1941), 276-86. 

[34] Litchfield, J. T., Jr., and Wilcoxon, F., “The reliability of graphic estimates 
of relative potency from dose-per cent effect curves,” Journal of Pharmacol- 
ogy and Experimental Therapeutics, 108 (1953), 11-18. 

[35] Litchfield, J. T., Jr., and Wilcoxon, F., “A simplified method of evaluating 
dose effect experiments,” Journal of Pharmacology and Experimental Thera- 
peutics, 96 (1949), 99-113. 

[36] Miller, L. C., and Tainter, M. L., “Estimation of the E.D. 50 and its error 
by means of logarithmic-probit paper,” Proceedings of the Society of Experi- 
mental Biology and Medicine, 57 (1944), 261-64. 

[37] Pearl, Raymond, Introduction to Medical Biometry and Statistics. Third Edi- 
tion. Philadelphia: W. B. Saunders Co., 1940. 

[38] Pearson, K., Tables for Statisticians and Biometricians, Part 1. Third Edition. 
Biometrika Office, 1930. 

[39] Tables of Probability Functions, Vol. II. Federal Works Agency, Work Proj- 
ects Administration for the City of New York, 1942. Arnold N. Lowan, 
Technical Director. 

















R 1953 
| Grif. 


ll Sta- 


d Re. 


John 
it sta- 


ltural 
r and 


0nse, 
933). 


ormal 
tality 


leter- 
al re- 


od of 
\n-ef- 
stical 


ation 
n, 42 
sage- 
. 

nates 


racol- 


ating 
hera- 


error 
peri- 


Edi- 
tion. 


Proj- 
wan, 








At aR es RCS 


ESTIMATING THE BIO-ASSAY WITH QUANTAL RESPONSE 599 


[40] Taylor, W. F. “Distance functions and regular best asymptotically normal 
estimates,” Annals of Mathematical Statistics, 24 (1953), 85-92. 

[41] Tripod, J., “Proceedings of International Biometrics Conference. Discus- 
sion on biometric aspects of biological assay,” Biometrics, 6 (1950), 328-29. 

[42] Von Krogh, Mentz, “Colloidal chemistry and immunology,” Journal of In- 
fectious Diseases, 19 (1916), 452-77. 

[43] Wilson, E. B., “The logistic or autocatalytic grid, Proceedings of National 
Academy of Sciences, 11 (1925), 451-56. 

[44] Worcester, Jane, and Wilson, E. B., “A table determining L.D. 50 or the 
fifty per cent end point,” Proceedings of the National Academy of Sciences, 29 
(1943), 207-12. 

[45] Wilson, E. B., and Worcester, Jane, “The determination of L.D. 50 and its 
sampling error in bio-assay. II,” Proceedings of the National Academy of Sci- 
ences, 29 (1943), 114-20. 

[46] Wilson, E. B., and Worcester, Jane, “The determination of L.D. 50 and its 
sampling error in bio-assay,” Proceedings of the National Academy of Sciences, 
29 (1943), 79-85. 

[47] Yule, G. U., “The growth of population and the factors which control it,” 
Journal of the Royal Statistical Society, 88 (1925), 1-62. 











CRITICAL VALUES OF THE LOG-NORMAL 
DISTRIBUTION 


Jack MosHMAN 
Oak Ridge National Laboratory 


INTRODUCTION 


COMMON statistical problem is that of testing a null hypothesis us- 
A ing a statistic drawn from some unknown distribution. The gen- 
eral configuration of the probability density function is known from 
empirical evidence. It is suggested that for certain applications, the 
logarithmic-normal distribution be used to approximate the unknown 
distribution by equating the first three moments. It would then be 
convenient to have a table of critical values of the log-normal distribu- 
tion, standardized for the first two moments and tabulated for various 
values of the skewness. 


PEARSON TYPE III DISTRIBUTION 


The Pearson Type III distribution is the only three-parameter dis- 
tribution whose integral has been extensively tabulated and is gen- 
erally available. There are two important differences between the log- 
normal and Type III distributions in spite of their superficial similarity. 
These differences may be exhibited as follows: 


Criterion Type III Log-Normal 


Points of Inflection Equidistant from mode Distances from the mode 
vary with the skewness, 
but differ for non-zero 
skewness. 

High Contact Not present at finite end for Always present 

large skewness; lower part 
of curve may have nega- 
tive curvature. 


APPLICATION 


The use of the normal distribution in applications where the coeffi- 
cients of variation is large, presents many difficulties. Observed values 
more than twice the mean would then imply the existence of observa- 
tions with negative values. Frequently this is a logical absurdity. The 
use of the logarithmic-normal distribution has been investigated as a 
possible solution to this problem [2, 6, 8, 10, 11, 12, 13, 18, 22]. 

In a review of the literature Gaddum [5] found that the log-normal 
distribution could be used to describe: 


600 





us- 
en- 
om 
the 
wn 


uU- 
us 


lis- 
n= 
og- 
ty. 


ode 
288, 
Pro 


1es 
ra- 
‘he 
a) 


1al 








CRITICAL VALUES OF LOG-NORMAL DISTRIBUTION 601 


(a) The threshold of sensation; 

(b) The size of silver particles in a photographic emulsion; 

(c) The sensitivity to drugs; 

(d) The survival time of insects treated with disinfectants; 

(e) The average size of the different species in each of various phylogenetic 
groups; 

(f) The number of plankton caught in different hauls of a net; and 

(g) The amount of electricity used in medium-class American homes. 


A non-exhaustive review of the literature revealed many other ap- 
plications. Yuan [22] found an excellent fit to weights of female stu- 
dents by the log-normal distribution. Kolmogoroff [15], Halmos [9], 
and Kottler [16] discussed the applicability of this distribution to the 
distribution of sizes of small particles. Epstein [4] derived the log-nor- 
mal distribution as the asymptotic distribution of particle sizes result- 
ing from breakage processes. Wicksell [20] applied the log-normal dis- 
tribution to graduate the frequencies of age upon marriage of bachelors 
and spinsters. Sentence length of various authors was fitted by the log- 
normal distribution by Williams [21]. Application of the log-normal 
distribution to economic data was made by Gibrat [7] and to agricul- 
tural data by Cochran [1]. Recently, Krige [17] applied the log-normal 
curve to the distribution of gold values in the mines of the Witwaters- 
rand. Cureton [3] suggested that the log-normal distribution be used 
to approximate the distribution of means of samples from a finite popu- 
lation in one type of psychological test-item analysis. 


CRITICAL VALUES 


In many applications, it would be convenient to have a table of criti- 
cal values of the log-normal cumulative distribution corresponding to 
specified values of the parameters of the distribution. 

The log-normal probability density distribution may be written 


Ph i et {- —( zt) (1) 
1" ue ve A f 


and the cumulative distribution function is then 








F(a) = f sod, (2) 


where we assume b>0. If b<0, then a represents the upper rather than 
the lower limit of the distribution and all statements may be suitably 
modified. 








602 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


The parameters a, b, and c are related to the mean (), variance (o°), 
and skewness (a3) as follows: 
w= bw'?+ a4 
*w(w — 1) (3) 
+ (w — 1)'?(w + 2), 


o 


a 


where w =e®. The sign of a; is chosen to agree with that of b. 
If in (1) we let 
1 z—-0 


t=—] a9 4 
- 5 (4) 





then ¢ is distributed normally with zero mean and unit variance. Solving 
(4) for x 














x = be“ +a, (5) 
and solving (3) for b and a 
b Co 
~ fo — Dp}? 
(6) 
ie top ah ta sin we seeetnienion ‘ 
(w — 1)1/2 
Substituting from (6) into (5), 
Co p 4 Co 
"oo — pee T= ye 7 
wll2ect — | (7) 
das (w — 1)1/2 o+ 
Then from (7) 
t— —1/2oct — | 
r= pee , (8) 


o (w — 1)!/2 


where rt may be considered a standardized log-normal variate in terms 
of the unit normal deviate ¢, w and c, but w and c are each expressible 
in terms of a3. If tg is defined by 


1 C) 
— —17/2 = 9 
Vie i) e~* l?dt = B, (9) 





(4) 


ng 


3) 


18 
le 





CRITICAL VALUES OF LOG-NORMAL DISTRIBUTION 603 


then 
wll%ectg — ] 


_ (w a 1)1/2 . (10) 


Thus to obtain, say, the 5 per cent critical value of the upper tail of 
the log-normal distribution for a; =2, one first solves the equation 





w® + 3w? — (4 + a;? = 0, (11) 
for w, which is the only real root. Knowing a, 
c = Vlog a, 


tos = 1.644854, and 7.9 may be determined from (10). Finally, from 
(8) 
x= ut T.ge. (12) 


Table I contains values of rg for 8=.005, .01, .025, .05, .10, .90, .95, 
975, .99, and .995 for a3;=0(.05)3.00. The computations were per- 
formed with punched card equipment and rounded to three decimal 
places.! More extensive hand calculations were used to smooth out sec- 
ond differences where necessary. The tabular values are believed cor- 
rect to within one to two digits in the last decimal place. Three point 
Lagrangian interpolation [19] may be used to give similar accuracy for 
intermediate values of as. 


EXAMPLE 


A standard laboratory procedure consists of submitting a certain 
compound in crystalline form to a grinding process. After 2 minutes 
the average diameter of the pieces is determined from a sample of 10, 
selected at random from the ground pieces. Considerable experience 
shows that the means are well described by a log-normal distribution 
with mean, 5.76 mm., standard deviation .81 mm. and skewness .20. 

A new grinding process is now introduced which may have the prop- 
erty of displacing the previous distribution to the left which is an in- 
dication of greater efficiency. In any event the shape of the distribution 
will remain invariant. A sample of 10, selected from the result of a 2 
minute application of the new process, has a mean diameter of 4.45 
mm. Does this represent a significant departure from the established 
average for the older process? 





1 A limited number of tables for a: =0(.01)3.00 to four decimal places and the same zg are available 
upon request to the writer. 





604 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


The problem reduces to determining the probability that r, from 
equation (8), will have as extreme a value as that noted. Since the shape 
of the curve is invariant ¢=.81 and a;=.13. One finds 


_ 4.45 — 5.76 
7 81 


In Table I, for a3 =.20, it is seen that — 1.617 lies between the tabulated 
values — 1.586 and —1.865 which corresponds to values of 8 equal to 
.95 and .975 respectively. Hence in from 2.5 per cent to 5 per cent will 
one obtain a mean value of 4.45 or less. One has then good reason to 
conclude that the new process is more efficient. If, on the other hand, 
one ignored the skewness present, and used the normal curve then one 
finds that a value of t= —1.617 would occur by chance between 5 per 
cent and 10 per cent of the time. This may be seen when a;=0 in Table 
I which reduces to percentage points of the normal curve. 





T 


= — 1.617. 


COMMENT 


The critical values tabulated apply to cases where o and az are 
known. In many applications there will exist a fund of experience which 
will insure this. Frequently, however, one will have not o and az, but 
estimates s and a3. If s and a; are based on small samples, the use of 
Table I may well be invalid. 


REFERENCES 


[1] Cochran, W. G., “Some difficulties in the statistical analysis of replicated 
experiments,” Empire Journal of Experimental Agriculture, 6 (1938), 157. 

[2] Cohen, A. C., Jr., “Estimating parameters of logarithmic-normal distribu- 
tions by maximum likelihood,” Journal of the American Statistical Associa- 
tion, 46 (1951), 206-12. 

[3] Cureton, E. E., “A method of item analysis based on the theory of sampling 
from a finite population.” Paper read at the 1952 meeting of the Southern 
Society for Philosophy and Psychology at Knoxville, Tennessee. 

[4] Epstein, Benjamin, “The mathematical description of certain breakage 
mechanisms leading to the logarithmico-normal distribution,” Journal of 
the s’ranklin Institute, 244 (1947), 471-77. 

[5] Gaddum, J. H., “Lognormal distributions,” Nature, 156 (1945), 463. 

[6] Galton, Francis, “The geometric mean in vital and social statistics,” Pro- 
ceedings of the Royal Society, 29 (1879), 365-67. 

[7] Gibrat, R., Les Inégalités Economiques. Paris: Libraire du Recueil Sirey, 
1931. 

[8] Gumbel, E. J., “Uber ein Verteilungsgesetz,” Zeitschrift fiir Physik, 37 
(1926), 469-80. 

[9] Halmos, P. R., “Random alms,” Annals of Mathematical Statistics, 15 
(1944), 182-89. 





CRITICAL VALUES OF LOG-NORMAL DISTRIBUTION 605 


[10] Jenkins, T. N., “A short method and tables for the calculation of the aver- 
age and standard deviation of logarithmic distributions,” Annals of Mathe- 
matical Statistics, 3 (1932), 45-55. 

[11] Johnson, N. L., “Systems of frequency curves generated by methods of 
translation,” Biometrika, 36 (1949), 149-76. 

[12] Kapteyn, J. C., Skew Frequency Curves in Biology and Statistics. Groningen: 
Noordhoff, 1903. 

[13] Kapteyn, J. C., and van Uven, M. J., Skew Frequency Curves in Biology and 
Statistics. Groningen: Noordhoff, 1916. 

[14] Kendall, M. G., The Advanced Theory of Statistics, Vol. 2, Second Edition. 
London: Griffin, 1945. 

[15] Kolmogoroff, A. N., “Uber das logarithmisch normale Verteilungsgesetz 
der Dimensionen der Teilchen bei Zerstiickelung,” Comptes Rendus (Dok- 
lady) de l’ Académie des Sciences de l’URSS, Nouvelle Série, 31 (1941), 99-101. 

[16] Kottler, F., “The distribution of particle sizes,” Journal of the Franklin 
Institute, 250 (1950), 339-56 and 419-41. 

[17] Krige, D. G., “A statistical approach to some basic mine valuation problems 
on the Witwatersrand,” Journal of the Chemical, Metallurgical and Mining 
Society of South Africa, 52 (1951), 119-39. 

[18] McAllister, Donald, “The law of the geometric mean,” Proceedings of the 
Royal Society, 29 (1879), 367-76. 

[19] U. 8. National Bureau of Standards. National Applied Mathematics Labo- 
ratory. Computation Laboratory, Tables of Lagrangian Interpolation Co- 
efficients. New York: Columbia University Press, 1948. 

[20] Wicksell, 8S. D., “On the genetic theory of frequency,” Arkiv for Matemattk, 
Astronomi och Fysik, 12 (1917), No. 20. 

[21] Williams, C. B., “A note on the statistical analysis of sentence length as a 
criterion of literary style.” Biometrika, 31 (1940) 356-61. 

[22] Yuan, Pae-Tsi, “On the logarithmic frequency distribution and the semi- 
logarithmic correlation surface,” Annals of Mathematical Statistics, 4 (1933), 
30-74. 





606 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1983 


TABLE I 
CRITICAL VALUES OF LOGARITHMIC-NORMAL DISTRIBUTION 








Skewness T.10 7.06 028 T.O1 7.008 
(as) 7.90 7.95 7.975 T.99 7.996 





.282 .645- 1.960 2.326 .576 
1.282 1.645+ -1.960 —2.326 .576 
.287 1.659 1.984 2.363 .623 
1.276 .631 936 —2.290 .529 
.292 .673 2.007 2.400 .671 
1.270 .616 912 2.253 .483 
1.296 .68€ 2.030 2.437 .719 
.264 .601 889 2.217 .438 
.300 .699 2.053 2.474 .767 
.258 1.586 865+ —2.181 .393 
.304 712 2.076 2.512 .816 
1.251 1.571 841 —2.146 .349 
1.307 .724 2.099 2.549 865 — 
1.244 1.556 817 2.111 .305 — 
1.310 .736 2.121 2.586 914 
.237 .540 794  —2.077 .263 
.313 .748 2.142 2.622 .963 
1.229 1.525+ .770 2.043 221 
1.315+ .759 2.164 2.659 .012 
1.222 .509 747  —2.009 .180 
.317 .770 2.185-  2.695+ .061 
1.214 .494 724 —1.976 .140 
.318 .780 2.205+ 2.731 .109 
.206 .478 701 1.944 .101 
.319 .789 2254+ 2.767 .158 
.197 .463 678 1.912 .062 
.320 .799 .245- 2.802 .206 
.189 447 656 —1.881 .025 + 
.320 .807 .263 2.836 .255 — 
181 .432 634 —1.851 .988 
.320 .816 282 2.871 .302 
.172 417 -1.612 —1.821 .953 


0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 








CRITICAL VALUES OF LOG-NORMAL DISTRIBUTION 
TABLE I—(cont.) 





Skewness T.05 T 025 
(as) T 9 T 975 





0.80 3. .824 2.300 
0.80 3. 401 —1.590 
0.85 ° -831 2.317 
0.85 ‘ .386 —1.569 
0.90 ‘ .838 2.334 
0 371 —1.549 

0 844 2.350 3. 
0 .357 —1.528 -1. 
1 .850 2.366 3. 
1 .342 —1.508 —1. 
1 .856 2.381 3. 
1 .328 —1.489 -1. 
1. ‘ .861 2.395 + 

1 .313 —1.469 -1. 
1 -865 + 2.409 3. 
1 .299 —1.451 -1. 
1 .870 2.423 3. 
1 . 286 —1.432 -1. 
1 .873 2.436 3. 
1 .272 —1.414 —1.5 
1 .877 2.448 3. 
1 .259 —1.397 -l. 
1 .880 2.460 3. 
1 . 246 —1.379 -1. 
1 .883 2.471 3. 
1 233 —1.362 —1. 
1 -885+ 2.482 3. 
1 220 —1.346 -1. 
1 .887 2.492 3. 
1 208 —1.330 —l. 











608 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 
TABLE I—(cont.) 





Skewness 


(a3) 


7.10 
7.90 


7.06 
7.9% 





55 
.55 
-60 
.60 
-65 
65 
-70 
-70 
75 
75 


1.283 
—1.036 
1.279 
—1.028 


1.275+ 


—1.020 
1.271 
—1.012 
1.267 
—1.004 
1.262 
—0.996 
1.258 
—0.989 
1.253 
—0.981 
1.248 
—0.974 
1.244 
—0.967 
1.239 
—0.960 
1.234 
—0.953 
1.229 
—0.946 
1.224 
—0.939 
1.219 
—0.932 


.889 
1.195 — 
.890 
1.183 
-891 
.172 
1.892 
-160 
1.893 
1.149 
1.893 
1.138 
.893 
1.127 
1.893 
.116 
1.892 
1.106 
1.892 
.096 
1.891 
.086 
.890 
-076 
.888 
067 
.887 
-057 
886 
-048 














1953 


CRITICAL VALUES OF LOG-NORMAL DISTRIBUTION 
TABLE I—(cont.) 


609 








Skewness 7.10 7.08 025 T.o1 7.006 
(as) T.90 7.96 7.975 7.99 7.995 
2.30 1.215— 1.884 2.597 3.617 4.453 
2.30 —0.926 —1.039 —1.119 —1.195+  —1.237 
2.35 1.210 1.882 2.600 3.631 4.479 
2.35 —0.919 —1.030 —1.108 —1.182 —1.223 
2.40 1.205 — 1.880 2.604 3.645+ 4.504 
2.40 —0.913 —1.022 —1.098 —1.170 —1.210 
2.45 1.200 1.878 2.607 3.659 4.528 
2.45 —0.907 —1.013 —1.088 —1.158 —1.197 
2.50 1.195— 1.876 2.610 3.672 4.551 
2.50 —0.901 —-1.005— —1.078 —1.146 —1.184 
2.55 1.190 1.874 2.612 3.684 4.574 
2.55 —0.895+ —0.997 —1.068 —1.135+  —-—1.171 
2.60 1.185— 1.871 2.614 3.696 4.597 
2.60 —0.889 —0.989 —1.059 —1.123 —1.159 
2.65 1.180 1.869 2.617 3.708 4.619 
2.65 —0.883 —0.981 —1.049 —1.113 —1.148 
2.70 1.175— 1.866 2.618 3.719 4.640 
2.70 —0.877 —0.974 —1.040 —1.102 —1.136 
2.75 1.170 1.863 2.620 3.730 4.661 
2.75 —0.871 —0.966 —1.031 —1.092 —1.125+ 
2.80 1.165+ 1.861 2.622 3.741 4.681 
2.80 —0.866 —0.959 —1.023 —1.082 —1.114 
2.85 1.160 1.858 2.623 3.751 4.701 
2.85 —0.860 —0.952 —1.014 —1.072 —1.103 
2.90 1.155+ 1.855 — 2.624 3.761 4.720 
2.90 —0.855— -—0.945+ —1.006 —1.062 —1.093 
2.95 1.150 1.852 2.625 — 3.770 4.739 
2.95 —0.850 —0.938 —0.998 —1.053 —1.083 
3.00 1.146 1.849 2.626 3.779 4.757 
3.00 —0.845+ -—0.931 —0.990 —1.044 —1.073 

















THE PARTITION OF ERROR IN RANDOMIZED BLOCKS* 


O. KEMPTHORNE AND W. D. Barciay 
Iowa State College 


HE present note is concerned with procedures which are followed 
TR the analysis of experiments. From casual examination of the 
data, it is thought that the experimental error is not homogeneous over 
the observations. It is therefore decided to partition the treatment sum 
of squares into components, and to partition the error sum of squares 
correspondingly. Tests of significance are then made by comparing the 
component of the treatment sum of squares with its corresponding er- 
ror component by an F-test, or, if a partition of the treatments is made 
into individual degrees of freedom, by t-tests. See, for example, Cochran 
[2]. Before deciding to base the interpretation on the partitioning of the 
treatment and error sums of squares, it is also somewhat customary to 
make a test of the homogeneity of the components of the error sum of 
squares by means of Bartlett’s test. See, for example, Snedecor [8] 
p. 413. 

The above procedures must be considered in relation to the re- 
quirements for the analysis of variance. As stressed by various work- 
ers, for example, Cochran [3] and Bartlett [1], there are two distinct 
problems which arise, namely, non-additivity and heterogeneity of er- 
ror. It is not in general easy to determine whether these problems ac- 
tually arise with a given experiment, though they will be often closely 
related in their occurrence. On the problem of non-additivity, there is 
available one test, namely Tukey’s test for non-additivity which is 
based on normal law theory [9]. On the problem of heterogeneity of er- 
ror, there are available devices such as the plotting of range against 
mean of treatments, and the possibility of applying Bartlett’s test to a 
decomposition of the error sum of squares. 

The relative importance of the two problems depends on the point 
of view which the experimenter and statistician adopt. The first point 
of view is to regard the particular data which one obtains as a random 
sample from the conceptual population of data generated by imposing 
each treatment on each experimental unit, this sample having been 
obtained by choosing an experimental plan at random from a class of 
possible plans. This will be termed the randomization approach and is 
described in detail, for"example, in [6]. The second point of view is to 
regard the particular data which one obtains as having arisen as a ran- 





* Journal Paper No. J-2270 of the Iowa Agricultural Experiment Station, Ames, Iowa, Project 
No. 890. 


610 











PARTITION OF ERROR IN RANDOMIZED BLOCKS 611 


dom sample from a population specified by a mathematical model with 
normally distributed errors. In the present note, the authors follow the 
first approach. 

The definition of additivity which will be used here is the following: 
that the effect of treatment k is to add a constant to the basic or con- 
trol yield of each plot. If one follows the randomization approach, it is 
fairly easy to see that additivity will result in homogeneity of error 
in the completely randomized experiment, in the sense that variance 
between plots treated alike will be independent of treatment. In the 
case of the ordinary randomized block experiment, it may be deduced 
that with additivity the error variance will be constant for all nor- 
malized treatment comparisons [6]. It appears, therefore, that additiv- 
ity of treatment effects is much more important than homogeneity of 
error, and this is intuitively reasonable since without additivity the 
meaning of estimates of treatment effects and differences is obscure. 
Since non-additivity will generally produce heterogeneity of error [5], 
a significant result from the application of Bartlett’s test could possibly 
be used as an indication of non-additivity in the data and an appropri- 
ate measure, such as a transformation of the data, could then be used. 
It will presumably be possible to have non-additive effects which give 
the same variance for each treatment comparison on the average. How- 
ever, the usual procedure of analysis on the observed scale will be rea- 
sonably efficient in such a situation. 

When the error is heterogeneous, the usual procedure is to make a 
transformation which makes the error as homogeneous as possible. Ad- 
ditivity on the new scale is then assumed. A test of homogeneity of 
variance is therefore desirable, and it is appropriate to consider how 
the distribution of Bartlett’s criterion is affected by non-normality of 
the parent distributions. It has been shown by Fisher [5], Welch [10], 
and Pitman [7], for example, that conventional t-tests and F-tests by 
and large are satisfactory from the randomization point of view. The 
randomization point of view consists of examining the distribution of 
the test criterion over the possible sets of data which could arise in the 
population of possible randomizations. The over-all tests for treatments 
has been shown to mirror satisfactorily the corresponding randomiza- 
tion test. It was therefore decided to examine the extent to which some 
other tests reflected the corresponding randomization tests. The tests 
considered were the testing of a component of the treatment sum of 
squares against the error sum of squares and Bartlett’s test of homo- 
geneity of variances applied to components of the error sum of squares, 
specified by orthogonal comparisons of the treatments. 





612 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Since a mathematical investigation of these matters is difficult or 
perhaps impossible, an empirical study was made. Using the primary 
data of Eden and Yates [4] which consists of 8 blocks of 4 plots, a sample 
of 1,000 of the possible experiment plans for comparing 4 treatments 
in randomized blocks was drawn and examined. The treatment sum of 
squares was partitioned into 3 orthogonal single degree of freedom 
squares and the error into three corresponding parts each with seven 
degrees of freedom. It was found that the F-tests of components of the 
treatment sum of squares were satisfactory, but Bartlett’s criterion ap- 
plied to components of the error sum of squares was distributed in a 
markedly different way from expectation under large sample normal 
theory. The sample distribution is summarized in Table 1. 


TABLE 1 


DISTRIBUTION OF BARTLETT’S HOMOGENEITY OF ERROR 
CRITERION OBTAINED FROM 1000 RANDOM PLANS COM- 
PARED WiTH APPROPRIATE x? DISTRIBUTION 








x? Expected 
Lower Observed (normal (O—E)?/E 
limit theory) 


P class 
limit 





0 
9.210 10 
01 
7.824 10 
02 
5.991 30 
-05 
4.605 50 
10 
3.219 
-20 
2.408 
.30 
1.386 
50 
713 
.70 
-446 
-80 
211 
.90 


1.00 





x? =373.335 P<.01. 





PARTITION OF ERROR IN RANDOMIZED BLOCKS 613 


It was found that the verdict of heterogeneity of error based on Bart- 
lett’s test at the 5 per cent level, would be reached in 13.3 per cent of the 
samples. There is therefore a marked tendency to conclude that there 
is heterogeneity of error when in fact each of a complete set of normal- 
ized orthogonal comparisons is subject to the same error variance. It 
is possible, though unlikely, that the present numerical example is 
peculiar, but it seems more reasonable to conclude that it is indicative 
of the general situation. Insofar as this is the case we may conclude 
that the application of Bartlett’s test to the testing of the homogeneity 
of error in a randomized experiment is unreliable and should not be used 
as a general procedure. 

The consequences of concluding that there is heterogeneity of error 
when in fact this does not exist are two fold: (1) a loss in sensitivity 
of the experiment and (2) an underestimation of the accuracy of some 
comparisons with an overestimation of the accuracy of other compari- 
sons. The situation is not improved by the experimenter making a sub- 
division of the error sum of squares which is based on the observed re- 
sults. This procedure has two effects: (1) of giving an impression of 
higher sensitivity on some treatment comparisons, and (2) pronounced 
biases in the estimation of errors of the treatment comparisons. 

The present note brings to the fore two problems which need solu- 
tion. The first problem is the extent to which Bartlett’s test, which is 
based on as mptotic theory, can be applied to samples of the sizes nor- 
mally encountered. The second problem is the behavior of Tukey’s test 
under randomization: if this is not satisfactory, the test procedure is 
not reliable for randomized experiments in which plot errors are large. 
For in this case the experimenter is definitely picking an experimental 
plan at random from the appropriate class of plans and should be sub- 
ject to probabilities of error of the magnitude he chooses by picking a 
significance level. The problem of testing for additivity is crucial in the 
analysis of experiments. It is not reasonable to regard deviations from 
additivity as being additional sources of error variance, particularly 
when these arise as block treatment interactions. 


REFERENCES 


[1] Bartlett, M. S., “The use of transformations,” Biometrics, 3 (1947), 39-52. 

[2] Cochran, W. V., “Some difficulties in the statistical analysis of replicated 
experiments,” Empire Journal of Experimental Agriculture, 6 (1938), 157-75. 

[3] Cochran, W. G., “Some consequences when the assumptions for the analysis 
of variance are not satisfied,” Biometrics, 3 (1947), 22-38. 

[4] Eden, T., and Yates, F., “On the validity of Fisher’s z-test when applied to 
an actual sample of non-normal data,” Journal of Agricultural Science, 23 
(1933), 6-16. 











614 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


[5] Fisher, R. A., The Design of Experiments, Fourth Edition. New York: Hafner, 
1947. 

[6] Kempthorne, O., The Design and Analysis of Experiments. New York: John 
Wiley and Sons, 1952, Chapter 8. 

[7] Pitman, E. J. G., “Significance tests which may be applied to samples from 
any population. III, The analysis of variance test,” Biometrika, 29 (1937) 
322-35. 

[8] Snedecor, G. W., Statistical Methods, Fourth Edition. Ames, Iowa: Collegiate 
Press, 1946. 

[9] Turkey, J. W., “One degree of freedom for non-additivity,” Biometrics, 5 
(1949), 232-42. 

[10] Welch, B. L., “On the z-test in randomised blocks and latin squares,” 
Biometrika, 29 (1937), 21-52. 











IR 1953 


fafner, 
: John 


3 from 
(1937) 


egiate 
‘ics, 5 


Ares,” 





SUMMARIES OF PAPERS DELIVERED AT THE 112th 
ANNUAL MEETING OF THE AMERICAN STATISTICAL 
ASSOCIATION IN CHICAGO, DECEMBER 27 TO 30, 1952. 


Edited by Armen A. ALcHIAN University of California (Los Angeles) 


Because the decision to publish summaries of papers delivered at the 1952 
annual meetings was not made until the same meetings, adequate advance 
notice of intent to publish was not given to speakers. To publish the sum- 
maries without greater delay than that already incurred could be achieved 
only by relatively arbitrary editorial action by the Summaries Editor in 
amending, emending, or revising the summaries without giving the authors 
an opportunity to approve or disapprove of the changes. Consequently the 
summaries editor accepts all responsibility for any undesirable changes. All 


summaries received are published in the following pages. 


PAPERS SUMMARIZED 


Apert, Harry, The Review and Coordination of Data Collecting Activities 
Sponsored by the Federal Government 

AttmAN, I., Statistical Problems Encountered in the Work ‘of the Commission 
on Fi inancing of Hospital Care. . 

ANDERSON, R. L., The Problem of Aut tocorrelation in Regression. Analysis . 

ANDERSON, R. i Recent Advances in Finding Best Operating Conditions . 


' AROIAN, Lxo, What Makes A Statistical Quality Control Chart Tick? 


BalLEy, A. a Credibility Procedures Are Required to Estimate Parameter 
Values for Individuals of Heterogeneous Populations 

BakER, J., Evaluation of Forecast of the Wheat Crop . 

BARKIN, s., Labor’s View on Actuarial Requirements for Pension Plans 

BELz, M. H., Some Recent Applications of Statistics in Australia ‘ 

BERGSON, +. , Reliability and Usability of Soviet a A peated Ap- 
pratsa ‘ 

BowMan, R. T. Some Notes on the Capacity Concept ‘ 

BRADLEY, R., On the Teaching of Statistics: N on-Parametric Methods in the 
Elementary Statistics Course. 

BRONSON, Pension Plans—The Concept of Actuarial ‘Soundness . 

Brunk, M. Pa. Experimental Designs and Probability ened in = arket- 
ing Research ‘ 

CarTER, H., Improving M arriage and Divorce Statistics . 

CHAKRAVARTI, N., Statistical Organization and Estimates of ‘Crops in West 
Bengal, India’. 

Cot, ‘What an Economist Wants to Know in the Way of Saving Data 

CornFIELD, J., Household Survey on Health Conditions and Medical _ 
in New York City. . 

Court, A., Climaiology’s Needs in Statistical Research . 

DaRROcH, yi G., Organizational, Personnel, and Statistical Problems Facing 
the N. eophyte Station Statistician . 

Dearporrr, N. R., Household Survey on Health Conditions and Medical 
Care in’ New York City . 

DensEN, P. M., reniaine the Significance of Vital Statistics Through 
Special Studies . . Fe 

Dixon, W. J., Non-Parametric Tests: Power Under Normality ‘ 

Drake, L. A. City and Area Statistics . 

Duncan, D. B., Testing the ray aged of Treatment Means in an Analysis 
of Variance of Engineerin —— 

Dunn, H. L., Elements of a bandinated System of Vital Records and Sta- 
tistics oe ee 

Durron, A. M., The Analysis o of Biological Time Series 

FEpERER, W. T., Experimental Designs and Probability Sampling i in Market- 
ing Research 

Fiepuer, F. E., Measurement of Unconscious Attitudes in the Evaluation of 
Counseling’ 


615 


617 
627 
619 
627 
621 
617 
642 
639 
640 


629 
622 


632 
638 


619 
638 


641 
623 


620 
626 


634 
620 
637 
628 
641 


621 
620 


619 
630 














616 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Fouey, D. L., Census Tracts and Urban Research. 
GERSCHENKEON, A., Reliability of Soviet Industrial ‘and National Income 
tatistics . . 
GERSHENSON, C., A Comparison of Two Different Methods of Caseworker 
Judgments of Movement. . 
Go.psmiTH, R., The Next Steps in the Statistical Study of Saving 
Gorpon, M. 8., Research Design of the Survey of Patterns and Factors in 
Mobility in "Siz Cities. 
Grouskxopr, H., A Factorial Design Applied toa Specific Chemical Process 
and Development Problem. 
Harvey, G., Forecasting Fruit Production in California . : 
HerepuncEe, . J., Summary of House Committee Report on Federal Crop 
eporting 
ocean, E., Notes on the Revision of the Consumer Price Index 
Hurp, C he New Electronic Machines and the Future of Statistics. 
pote a. The Mathematical Biophysics of the Cardiovascular System 
KEMPTHORNE, 0., The Randomization Theory of Experimental p by Prva ; 
Kina, A., Problems of Data Collection Under Federal Sponsorship y Private 
Service Agencies . ; 
KITAGAWA, , Components of a ’ Difference Between Two Rates. . 
KRUMBEIN, W.,A pplications of Statrstical Methods to Sedimentary Rocks 
KuZNETS, G. Forecasting Fruit Production in California . 
LEBERGOTT, 8. Estimates of Labor Force, Employment, and Unemployment . 
Lipsy, W., The Radiocarbon Calendar. . 
LIMBER, D., The Analysis of Counts of the Extragalactic ‘Nebulae in Terms 
of a Fluctuating Density Field . > * 4 
Link, R., Exposition of Straight Line Fitting Methods 
LivINGsToN, D., The Use of Statistical Techniques in the Accounting De- 
pasnes 4 a Large Manufacturing Company ° 
Lorimer, F., The Nature of Soviet Population aa Vital ‘Statistics . 
Lucas, H., Use of Observations Taken Periodically in Growth Studies 
MARES, E., Research on Response Errors . 
MARSHALL, C., Organization and Scope of Activities of Station Statistician 
MARSHALL, H., Some Recent Developments in Canadian Statistics . 
Mosgs, L., General Review of Non-Parametric Methods with Special Empha- 
sis on Randomization Tests. . 
MostTE.ueER, F., Some Problems in Determining “Maxima of Functions of 
Several Variables . ; 
MusaraveE, R., General Equilibrium Aspects of Incidence Theory 
a R., Forecasts of War Production Authorities 
NEYMAN, J Probabilistic Study of Clustering of Galaxies in a Static and in an 
Expanding Universe . 
PEEL, R., Current Developments and Problems in Connection with the Census 
Tract Program . ‘ 
PetsHEK, K., Research on Extent and Scope of Collective Bargaining 
Pierson, W., ’ Jr. Ocean Surface Waves. ; 
Pincvus, i, Some Statistical Problems in Field Geology ' 
Rapoport, ’A., Probabilistic Theory of Neural and Social Phenomena 
Reiss, A., Jr., Factors in Generation Occupation Mobility 
Riney, R., Statistical Problems Encountered in the Papen of the Small 
Defense Plants Administration . 
Roserts, D., The Application of M: obility Research to Labor Supply Models 
—— A, The Use of Laboratory Experiments in Teaching Probability 
tatistics . . 
RoTHWELL, NAaoMI D., Problems of Data Collection Under Federal Sponsor- 
ship by Private Service A gencies 
SaceEn, O., Production of Vital Statistics as a Combined Federal-State Op- 
eration 
Scumint, G., A Mathematical Theory of ‘Capillary Exchange as a Function of 
Tissue Structure . 
Scott, E., Probabilistic Study of ‘Clustering of Galazies in a Static and in an 
Expanding Universe . 
Suryrock, H., Jr., Coordination ‘of Population Estimates Used by Federal, 
State, and Local Agencie 
Siemonp, R., Statistica 'Fedions Encountered in the Work of the Commission 
on Financing of Hospital Care . . -« . 


623 
629 


630 
624 


633 


631 
619 


642 
628 
625 
644 
634 


618 
632 
626 
619 
627 
631 


622 
637 


641 
629 
619 
636 
639 
631 
627 
636 
621 
631 
623 
635 
631 
626 
632 
633 


634 
632 


624 
618 
635 
644 
631 
622 
627 


OT 


Pearse 








R 1953 
623 
629 


630 
624 


633 


631 
619 


642 
628 
$25 
644 
634 


618 
632 
626 
619 
627 
631 


622 
637 


641 
629 
619 


635 
639 


631 
627 
636 
621 
631 
623 
635 
631 
626 
632 
633 


634 
632 


624 
618 
635 
644 
631 
622 
627 











SUMMARIES OF PAPERS AS THE 112TH ANNUAL MEETING 617 


Simmons, W., The Elements of an Industrial Classification Policy . . . 641 
SuitTH, R., Technical Aspects of Transportation Flow Data . . . . . 618 
Sorn, L., N eeded Improvements in Esiimating the Corn Crop 642 
SrePHAN, F -» Some Potential Contributions of Mathematics to ‘Social and 
Economic Statistics . . ‘ 625 
Stimson, H. F., Precision M easurements in Thermometry 634 
— H., Some A pplications of Statistics to Research in Time and Motion ian 
tu a. 4 
TUCKER, *R, The Distribution of Government Burdens and Benefits . . . 636 
TuKEY, J., M ultiple Comparisons ._ . . . 624 
Vou, L., ” Agricultural Statistics in Soviet Russia. so « <-_Se 
WALLACE, D., Comparison of the Means of Two Samples — +» oe 
WALLACE, J., ’ Forecasting and Estimating the Cotton Crop . . . . . 642 
Weiss, S., A New Approach to Capacity Measurement . 622 
WiLcoxon, F., A Factorial Design Applied to a Specific Chemical Process 
and Development Problem ... , 631 
Witney, S., Problems of Data Collection Under Federal Sponsorship . . 617 
WoLFBEIN, a A New Approach to Capacity Measurement . 622 
Woo.sey, T., Use of the Census Current Population Survey to “Obtain In- 
formation on Morbidity . . 620 
Yamakl, N., Recent Application of Quality Control in J apan >. *< —~ sce 
YoupEn, Ww, Experimental Designs for the Physical Sciences . . . 625 


(Speakers whose summaries were not available are not "7 ) 


Credibility Procedures Are Required to Estimate Parameter Values for Individuals of Heterogeneous 

Populations. A. L. Barter, Lumbermen’s Mutual Casualty Company. 

Problems of estimation for which the classical Bayes’ approach would require a knowledge of the 
functional form of the a priori distribution are discussed. There are many instances in other fields, simi- 
lar to those found in casualty insurance, where there are reliable data available as to the mean and vari- 
ance of the a priori distribution even though the true functional form of that distribution is not known. 
Such data should be used in conjunction with an assumption that the functional form of the a priori 
distribution is the simplest form having the desired mean, variance and range. Specifically, the Beta, 
Gamma and Normal distributions should be assumed when the ranges are 0 to 1,0 to + and —& to 
+ respectively. 

Casualty insurance ratemaking has benefited greatly from the use of such procedures which have 
been applied for many years on a rule of thumb basis. The mathematical justification for such pro- 
cedures has recently been developed by the author. In effect, they bridge the gap between the two ex- 
tremes otherwise available to statisticians: one in which they assume that all previously available data 
or data as to other sub-populations is immaterial, and the other in which they assume either that con- 
ditions have not changed from the past or that all sub-populations are homogeneous. 

In many cases the use of available knowledge of the mean and variance of an a priori distribution 
will produce a substantial reduction in the error variance of the estimates being made. 


Problems of Data Collection Under Federal Sponsorship. Sterxen B. Witney, University of Michigan, 


The purpose behind government data collection is most frequently information for its own sake 
secondly for policy decision or direction, and less frequently basic research having long term goals 
A major problem to the academic scientist is the low priority given to the last. In addition, the social 
scientist receives a rather small fraction of the total governmental research budget. Six problem areas 
have general applicability. They are: the general climate in which the contracting governmental agency 
operates; the general climate within the agency (or program) regarding research and data collection; 
the nature and content of the problem of research; timing, pressure and deadlines; acceptance and un- 
derstanding of the findings; personality, interests and attitudes of the individual sponsors. The broad 
similarities and few differences in research for government versus industry are summarily treated. 


The Review and Coordination of Data Collecting Activities Sponsored by the Federal Government. 

Harry Aupert, Bureau of the Budget. 

Federal sponsorship of social science and statistical research by non-governmental agencies received 
its greatest impetus in the period immediately following World War II as part of the growing over-all 
program of governmental support of scientific research and development. It is estimated that the Fed- 
eral Government spends annually approximately $2.25 millions for contract and grant research involv- 
ing data collecting activities. 

The Office of Statistical Standards of the Bureau of the Budget, as the central coordinating agency 
of the Federal statistical system, believes that data collections included in contracts and grants spon- 
sored by Federal agencies must be coordinated with other parts of the Federal statistical system in 











618 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


order to avoid unnecessary duplication and unwarranted expenditures, and in order to protect the 
confidence of the general public in the integrity and economical operation of the Federal Government's 
data collecting activities. Several mechanisms, such as waivers, agency self-policing procedures, post- 
audit review, and advisory review, have been developed to achieve the objective of reviewing federally 
sponsored data collections with minimum administrative machinery, and without interference with the 
flexibility and independence of inquiry which are essential to a sound program of scientific research. 

The survey planning process must of necessity include provision for review and coordination. The 
need for basic statistical standards is indicated by several “horror” examples drawn from actual expe- 
rience with contractors for Federal agencies. 

The major arguments in opposition to Budget Bureau review in the area of contract and grant 
research are discussed. Special consideration is given to the problem of the role of the principal inves- 
tigator in contract and grant research and to the issue of academic freedom. It is argued that the prin- 
cipal investigator has an obligation to give sustained personal direction to the project in order that 
maximum advantage be taken of his technical and professional skills. Academic freedom, it is noted, 
includes the freedom not to accept or to seek Federal funds for contract or grant research. It does not in- 
clude the right to do sloppy work-——with someone else’s money. 

Bad statistics, no matter by whom produced, reflect on all statistics. Statisticians, public and 
private, governmental and academic, have a common objective: to serve society by providing it with 
the finest body of solid, meaningful facts we are capable of producing. 


Problems of Data Collection Under Federal Sponsorship by Private Service Agencies. ARNOLD J. Kina 
AND Naomi D. Rotuwe.u, National Analysts, Inc. 


Two problems of data collection by a private research firm under Federal sponsorship presented 
were: (1) The advantages and disadvantages to the government agency, and, (2) Some of the barriers 
to the use of commercial firms in the collection of survey data. The advantages of private agencies are 
greater speed, and efficiency consistent with high standards. Cases were cited. Surveys require seasoned 
interviewers and a highly trained technical staff experienced in study design, questionnaire construc- 
tion, sample design and experimental design. Other advantages cited for using commercial firms to col- 
lect data are that it prevents the diverting of administrators from their major responsibilities, con- 
tributes to objectivity, and induces the flow and interchange of ideas. The disadvantages are that data 
collection is separated from the decision maker and the survey may be diverted from testing hypotheses 
that are needed to guide decisions leading to action. 

The barriers preventing the use of private research firms by the Federal government cited are: 
(1) unawareness by some administrators of the potential usefulness of survey research, (2) lack of knowl- 
edge of the contribution which commercial research firms can make to government programs, along 
with some prejudices against the very word “commercial,” and (3) inability of federal agencies to dis- 
tinguish between a less reputable, ill-equipped firm and the reputable firm adequately equipped to col- 
lect the data needed, resulting in the tendency to buy research on the basis of price only, which can only 
lead to lower standards. 

The solutions to the problems are: (1) strengthening the Bureau of the Budget’s responsibility in 
contracting out sample surveys and (2) the personnel of this Bureau being better informed as to the 
technical staff, facilities, standards, and ethics of the commercial firms. It is also suggested that the 
government give more consideration to bringing the government, universities and private agencies to- 
gether on a coordinated attack on the problem where the federal government and the universities’ per- 
sonnel formulate the research program and interpret the data, and the private agencies take the main 
responsibilities of designing the sample, conducting the field work and processing the data. 


Technical Aspects of Transportation Flow Data. R. T. Samira, Interstate Commerce Commission. 


This paper discusses the technical aspects of a few of the problems relating to the 1 per cent sample 
of rail carload waybills currently being secured by the Interstate Commerce Commission. The sample 
is selected by the carriers to include all revenue carload waybills numbered “1” or with numbers end- 
ing in the digits “01.” This selection is biased because monthly numbering systems at small stations re- 
sult in an excess of “1” bills. These are subsampled after receipt with probability proportional to the 
number of bills issued per month in the series in order to remove the bias and yield a representative and 
unbiased sample. Complete and accurate selection by the carriers is policed by comparison of the sam- 
ple returns to 1 per cent of each carrier’s Freight Commodity Statistics Report. 

The techniques used to develop a freight rate index are described. The regularly released waybill 
statistics include a detailed distribution of the reported traffic by commodity class, territorial move- 
ment, type of rate, and length of haul. This produces about 30,000 traffic categories each of which is 
relatively homogeneous with respect to the rate characteristics of the included traffic. A comparison of 
changes in average revenues for comparable categories, weighted by the tonnage for the base year, 
provides the basis for a series of rate indexes. 


ida ah te a 





Neal 





Kine 


ented 
arriers 
es are 
soned 
struc- 
0 col- 
, con- 
t data 


d are: 
nowl- 
along 
o dis- 
o col- 
1 only 


ity in 
0 the 
\t the 
es to- 
” per- 
main 


m ple 
imple 
) end- 
ns re- 
o the 
e and 


aybill 
nove- 
ich is 
on of 
year, 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 619 


These computed indexes are subject to sampling error and several techniques for estimating their 
standard deviations are described. The first is based upon the separation of the sample bills into several 
subsamples, development of traffic categories and from them indexes for each subsample. An estimate of 
the standard deviation for each index can be made from the variations observed in the subsample in- 
dexes. A second and much less costly, but also less accurate, method estimates the standard deviation 
from variations observed in indexes computed from several! subsamples of the initially used traffic cate- 
gories. These techniques are also applicable to other complex transportation problems. 


Forecasting Fruit Production in California. G. M. Kuznets, University of California (Berkeley) anD 

Greorce Harvey, Bureau of Agricultural Economics (Sacramento). 

Annual production forecasts of substantial accuracy are required for California fruit crops the dis- 
position of which is regulated by state or federal marketing agreements. Negative experience with fore- 
casts based on grower or fieid men crop ratings has given impetus to development of objective pro- 
cedures. The paper deals largely with problems encountered in evolving an efficient forecasting proce- 
dure based on measurements of physical characteristics of a maturing crop. For tree fruit crops, such as 
peaches or pears, the characteristics taken into account are number of fruit on tree and fruit size (di- 
ameter) at forecast date. The forecast may take the form of a ratio estimate relating some function 
of fruit counts and size measurements in two seasons or a regression procedure which utilizes the rela- 
tion, previously established, between harvest weight of fruit (per tree) and early season fruit counts and 
size measurements. Data collected in 1952 surveys of clingstone peach and Bartlett pear production 
areas in California provided tentative indications of sample size required for specified accuracy and 
made it possible to explore such questions as efficiency of partial (single branch or scaffold) fruit counts, 
accuracy of on-tree fruit counts not requiring destruction of immature fruit, optimum allocation of 
sample blocks and sample trees within blocks, all of which have an obvious bearing on accuracy and cost 
of objective procedures. 


Experimental Designs and Probability Sampling in Marketing Research. Max E. Brunk anD WALTER 
T. Feperer, Cornell University. 
This paper is published in full elsewhere in this issue. 


The Problem of Autocorrelation in Regression Analysis. R. L. ANpERson, North Carolina State College. 


Much research has been devoted to the distributions of various statistics used to test the existence 
of autocorrelation of successive observations. Others have studied the problem of estimating parameters 
in various stochastic processes, such as autoregressive and moving average processes. A summary of this 
research is given in this paper. 

Only recently has research been extended to the problem of testing for the existence of autocor- 
related errors in regression models, such as 


r 
Y: =Bo +2 BsXee + et, €21,3,°°°,%, 
ton! 
where the X’s are fixed predictors and the e’s are normally distributed with equal variance. Durbin and 
Watson (1950, 1951) present upper and lower bounds on the significance levels for making such tests. 
Moran (1950) presents an exact test for r=1. 

Too little information is available on the proper methods of estimating the 6’s when the e’s are auto- 
correlated. Aitken (1935) indicated the exact method of transforming the regression variables when the 
autocorrelations were known. Champernowne (1948) added to this general theory and presented a 
Bayesian method when the autocorrelations were not known. 

Cochran and Orcutt (1949) used empirical sampling methods to indicate the effects of autocorre- 
lated errors on the estimates of error and the #’s. They showed that, in many cases, first differences of the 
Y’sand X’s would have a relatively uncorrelated error process. 

Watson (1951) has shown the seriousness of using the wrong type of error process and incorrect 
estimates of the autocorrelations in transforming the regression variables. He concludes that the most 
fruitful research seems to be in utilizing more efficiently the estimates of the autocorrelations. 


Use of Observations Taken Periodically in Growth Studies. H. L. Lucas, North Carolina State College. 


Data from two feeding experiments with swine, one experiment with rats and one with steers were 
studied to ascertain the effectiveness of using in the statistical analysis, not only the initial and final 
body weights as is customary in feeding trials but also various numbers of the intermediate weights 
which are taken routinely at regular intervals. Following the approach of some previous authors, poly- 
nomials were fitted to the data for each animal, and the coefficients of the polynomials were analyzed 
to test dietary effects. This was done for both the weights and the logarithms of the weights. As judged 














620 AMERICAN STATISTIGAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


by the F ratios (between-diet mean square/within-diet mean square) obtained for the polynomial eo. 
efficients up to and including the cubic terms, three intermediate observations appeared to be optimum, 
i.e., the F ratios were largest with three intermediate observations. The same result was obtained for 
total gain as estimated from the fitted polynomial. It was noted that application of multivariate pro. 
cedures might lead to a somewhat different estimate of the optimum number of intermediate observa- 
tions, as also might the use of models which more accurately describe the growth curve than do poly. 
nomials. 


The Analysis of Biological Time Series. ARTHUR M. Dutton, University of Rochester. 


The methods of classical economic time series analysis are aimed primarily at the problem of test- 
ing hypothesis about—or estimating the parameters in—a fundamental stochastic model which is as- 
sumed to underlie the successive time-ordered observations in a series. The oscillatory properties of 
such a series, caused by autocorrelation of the errors, is particularly of interest with respect to predicting 
future values of the series. 

In biological experimentation involving time-ordered measurements on several plots or individuals 
in each of several treatment groups the correlational properties of the observations may be of impor- 
tance only because they complicate the analysis. Tests of the non-existence of treatment differences in 
trend ‘regression on time) are of more importance. 

Methods of multivariate analysis of variance for testing the non-existence of differences among the 
(multivariate) means of several groups consist primarily of computing a statistic analogous to the ratio 
of treatment mean square to error mean square which is computed in the univariate case. The distribu- 
tion of a function of this statistic can be approximated arbitrarily closely by the use of the classical ,'. 
These methods seem particularly applicable to the type of biological experiment considered, since repli- 
cation and randomization make the assumptions reasonably valid. 

Summarization of the data or transformed data from a single plot or animal in the form of esti- 
mated time regression coefficients may be more meaningful and simpler to analyze as a multiple-variate. 


Use of the Census Current Population Survey to Obtain Information on Morbidity. Tazopore D. Woo:- 
sey, U. S. Public Health Service. 


In three separate projects questions relating to illness have been added to the basic schedule of the 
Current Population Survey of the Census Bureau. In the first an inquiry on disabling illness or injury 
lasting one or more days in the previous calendar week was included in four monthly samples. In the 
second, the questions dealt with persons disabled by illness or injury on the day of the interview and 
the length of time such disability had lasted. The most recent investigation dealt with persons in the 
household believed by the respondent to have some form of arthritis or rheumatism, whether such 
cases had been seen by a doctor, and whether the condition had caused the person to change or cut 
down on work or other usual activities. 

The three supplements have provided data of apparently high quality, though of limited scope. In 
the second project mentioned above the value of the results was enhanced by a successful follow-up of 
the more severely disabled cases 37 months after the first of the monthly surveys and 18 monthe efter 
the second. 

Experience with collection of morbidity data by this means has indicated that the method is con- 
venient, speedy, and inexpensive. Estimates derived are applicable to the civilian population of the 
country as a whole, exclusive of the inmates of resident institutions. This factor alone makes the esti- 
mates unique in the field of morbidity. Knowledge of the non-sampling errors would increase their use- 
fulness, but estimates of sampling error which do accompany them are a great advantage. 

Some of the findings pointed to the necessity of clarifying some long-used concepts of morbidity 
statistics. These findings also have some lessons for those studying the labor force. This source cannot 
possibly fill by itself more than a small fraction of the needs for statistics on diseases, injuries, and im- 
pairments in the United States. 


Household Survey on Health Conditions and Medical Care in New York City. Neva R. Dearporrr 
AND JEROME CORNFIELD. 


In the Spring of 1952 the Special Research Project of the Health Insurance Plan of Greater New 
York conducted a field inquiry for the purpose of comparing the health conditions and medical care 
received by a sample of its insured families with the conditions of a random sample of the population 
of New York City. These two speakers reviewed the more detailed objectives of the study, the meth- 
ods employed, and the experience of the Project in carrying the survey plans through to completion and 
of relating the body of information acquired in the field survey with the other phases of the three year 
Research Project now in progress. The Project encompasses not only this Household Survey but also 








R 1953 


nial co- 
itimum, 
ined for 
ate pro- 
bserva- 
lo poly- 


of test- 
h is as- 
rties of 
dicting 


viduals 
impor- 
neces in 


ong the 
1€ ratio 
istribu- 
sical x, 
e repli- 


of esti- 
variate, 


Woot- 


> of the 
‘injury 
In the 
ew and 
} in the 
r such 
or cut 


ope. In 
v-up of 
2 efter 


is con- 
of the 
1e esti- 
ir use- 


rbidity 
cannot 
nd im- 


[DORFF 


r New 
al care 
ilation 
meth- 
on and 
Ye year 
it also 


be eial Cin a sae nog Hie Pe 2 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 621 


s longitudinal analysis of the insured population over a four year period. Thus it is possible in the case 
of the insured population to compare what respondents said to interviewers with the reports of the 
medical groups currently serving these respondents on questions relating to diagnoses and medical 


ces. 

Five thousand households were chosen for the sample of insured persons and the same number for 
the sample of the general population. The schedule required an interview averaging fifty-five minutes. 
The Project contracted with Alfred Politz Research, Inc. for the interviewer services. The results of the 
field operations were summarized. 


What Makes a Quality Control Chart Tick. Lzo A. Aroran, Hughes Research and Development Labo- 
ratories. 

The general theory of the effectiveness of a quality control chart used alone or with another chart 
was developed recently by H. Levene and L. A. Aroian (Journal of the American Statistical Association 
45, Dec. 1950, 520-29). The present paper applies the theory to the case of a single quality control 
chart for attributes (the p chart), under a single simple alternative, an increasing trend alternative, a 
rather chaotic alternative, and an erratic periodic alternative. Tables and charts illustrate the theory. 
The results shed light on the proper design of p charts, the choice of the upper and lower control limits, 
and the sample size. The paper will appear in a future issue of Industrial Quality Control. 


Forecasts of the War Production Authorities. Roprnson Newcomse, Investors Diversified Services. 

The first task in projecting defense expenditures is that of arriving at a judgment as to what pro- 
grams will survive the conflict of forces over a period of two to five years at least and what size the pro- 
grams will be after there has been a resolution, temporary or permanent, of the conflicts. Relatively lit- 
tle attention can be paid to current views; more attention must be given to what the viewsare likely to 
be two to five years hence. This is, of course, a problem in sociology and politics as well as in economics 

Once general conclusions have been reached as to the type and magnitude of programs which will 
be supported two to five years hence, attention must be directed to the technical and economic problems 
involved. Detailed studies must be made program by program of the rate at which production and 
deliveries must rise in order to achieve the goals assumed to survive. These rates must be compared 
with feasibility data, again with an eye towards political and economic pressures. The military in many 
instances have set production schedules far above feasibility. Realistic figures must be substituted for 
the military figures in such cases. Finally, judgments must be found both in principle and for specific 
programs as to whether a final state of readiness will rest primarily on stockpiles or will emphasize more 
moderate stockpiles plus standby plants. 

A review of the defense expenditures forecast by the ODM shows that they were far below those 
generally accepted by economists. Nevertheless, the first forecast made in April 1951 was about 5% too 
high for the fourth quarter of 1952. The December 1952 forecast of $54 billion as the possible peak in ’53 
may also be somewhat high. 

The combination of anticipated defense needs plus defense demands themselves reached a peak in 
the first quarter of 1951 Business investment, inventory accumulation, and security expenditures have 
represented a declining proportion of economic activity since that time, and in general have created 
less and less pressure on prices. The demand for security expenditures and other investment demands 
in 1953 will be much easier to support than they were in 1951 and 1952. 


Elements of a Coordinated System of Vital Records and Statistics. Hatsert L. Dunn, U. S., Public 

Health Service. 

Local, state, federal, and international units are all active in the vital statistics field, either in 
collecting and preserving vital records, in performing essential services to the public in relation to these 
records, or in producing the statistical by-products. The most important problem facing these diverse 
mechanisms is how to function as a coordintaed whole. 

With each unit doing its share of the job as it independently conceives it, there is much working 
at cross purposes. No matter how laudable these independent goals, nor how close we come to achiev- 
ing them, what counts with the user of vital records and statistics is the total impact. The local health 
officer, the State registrar, the National Office of Vital Statistics, the statistical units of the inter- 
national organizations—none can afford to “do its job” without considering whether it might more 
appropriately be done by another unit and without full awareness of where “its job” begins ard the 
others leave off. 

Asa step toward a coordinated system of vital records and statistics, the author defines his personal 
viewpoints as to the objectives and essential elements of such a system, and discusses the respective re- 
sponsibilities of the various levels of government. 











622 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Coordination of Population Estimates Used by Federal, State, and Local Agencies. Henry Suryocr, 
Jn., Bureau of the Census. 


In the field of population estimates, the main problem is not one of a plethora of conflicting figures 
from different sources but rather of a lack of official estimates of any kind for most areas. Regularly 
published estimates of the Bureau of the Census cover mostly the United States and States, with some 
classification by age, sex, and color. The staff of the Population Estimates and Forecasts Unit devotes 
from one-quarter to one-third of its time to giving advice (by conference, telephone, or letter) on ad. 
ditional estimates to other governmental agencies and the public. Some of this information on methodol- 
ogy has been published, including methods for current population estimates of cities and counties. 

Source data must be obtained from other agencies, especially the National Office of Vital Statis. 
tics, the Immigration and Naturalization Service, the Department of Defense, and State Departments 
of Education. The nature and availability of these data are discussed from the standpoints of errors in 
the population estimates and lags in their publication. 

The use of population estimates by federal, State, and local agencies is also examined, with em- 
phasis on the specific needs of particular agencies. Many unpublished figures are supplied routinely, or 
on special request, to other federal agencies. The Census Bureau does not have a coraplete picture of 
how State and local agencies use its population estimates. At least one agency in each of 27 States uses 
the Bureau's current estimates of State population, but other State agencies make their own. 

Most States make current estimates for their counties and for cities, but only a few use a method 
suggested by the Census Bureau. The department of health is usually the State agency responsible for 
current estimates. Greatest opportunities for cooperative efforts seem to exist in this area. A small but 
promising step is represented by the annual Public Health Conference on Records and Statistics, which 
has included a work group on “Population Statistics” for several years. Here federal and State statis- 
ticians have been discussing data needs and methodological and procedural possibilities. 


The Analysis of Counts of the Extragalactic Nebulae in Terms of a Fluctuating Density Field. D, Nzt- 
son Limper, The Yerkes Observatory. 

A method has been developed for analyzing the counts of the extragalactic nebulae on the assump- 
tion that the number of nebulae per unit volume at a point r can be expressed as: p(r) =p[1+D(r)], 
where f is a constant and where D(r) is a chance variable such that: D(r) =0, D(r)? = 8? =constant, and 
D(r:) D(r:) =B:T (|r: —r;|). 

The expressions for the serial correlations between the counts of the nebulae per unit solid angle 
have been obtained for this model in terms of the parameters f, 8%, and a micro-scale, re, characterizing 
the correlation coefficient, I'(|ri—rs|). This method of analysis makes it possible to include quite sim- 
ply in the development the effects of the absorption within our own galaxy. 

Preliminary results from an analysis of Professor Shane’s counts of the extragalactic nebulae in- 
dicate that the proposed model is quite adequate for explaining the observed general features in a satis- 
factory manner. 


Some Notes on the Capacity Concept. Rarmonp T. Bowman, University of Pennsylvania. 


Special attention is given to the notion of capacity for dynamic input-output analysis. The major 
points reviewed center about the idea of capacity as an output level to which an industry is economically 
restricted by the fixed stock of particularized capital facilities available at any time, so that output 
levels beyond the capacity will induce additions to capital stock. Attention is directed to the economic, 
rather than the physical limitation of output which is involved in this notion of capacity; to the diffi- 
culty of using a projected historic output rate in excess of the capacity one as a clear indicator of the 
time at which investment is induced; to the special difficulties of the concept when individual entrepre- 
neurs, rather than a collective authority, make the decisions to expand capital facilities; to the need 
for specifying the product mix; to the problem of independence of the capital facilities in the several 
industries; and to the possible lack of balance in current facilities and consequent variation in the 
amount and kind of facilities added in the short-period when investment is induced. The major conclu- 
sions are that a capacity measure in the sense required can be approximated quantitatively for most 
industries but will not permit short-period timing of induced investment demand and will also be weak 
as a basis for providing invariable capital coefficients for projecting the time sequence of induced in- 
vestment. The most feasible methods of making the type of capacity estimate required seem to involve 
the asking of business men what they would do under certain circumstances. Such estimates will be diffi- 
cult to evaluate. 


A New Approach to Capacity Measurement. Samvug, Weiss anD Szrmour Wo.Fsein, U. S. Dept. of 
Labor. 


Definitions of capacity to produce may utilize an economic or a technological approach. For the 








SER 1953 


Suryocr, 


ing figures 
Regularly 
with 80me 
it devotes 
T) on ad. 
nethodol- 
ities, 
al Statis. 
artments 
errors in 


with em. 
tinely, or 
jicture of 
ates uses 


. method 
sible for 
mall but 
8, which 
 statis- 


D. Nut- 


assum p- 
+D(r)), 
int, and 


d angle 
terizing 
te sim- 


alae in- 
& satis- 


) major 
nically 
output 
nomic, 
e diffi- 
of the 
trepre- 
e need 
everal 
in the 
onclu- 
* most 
. weak 
ed in- 
ivolve 
> diffi- 


pt. of 


r the 





SUMMARIES OF PAPERS AT THE 112TH \NNUAL MEETING 623 


purpose of measurement the Bureau of Labor Statistics adopted a concept which is a cross between these 
two basic approaches. 

Obtaining summary measures of output in physical units of production is difficult, because such 
units are usually non-additive. Varying physical units of output can be made additive by expressing 
them in terms of dollars or employment; however, it becomes necessary to deflate these deta by indexes 
of price or labor productivity. 

The Bureau of Labor Statistics has used an employment-man-hour technique which permits the 
estimation of productive capacity. This procedure involves a ratio of potential maximum to current 
man-hours. Such “capacity ratios” were obtained for metal working industries. 

Friction items tend to make the operating level lower than the potential maximum. An approxi- 
mation of the friction can be made and an “index of expansibility” which allows for this friction is sug- 
gested as a more realistic measure. 


Census Tracts and Urban Research. Donan L. Fouey, University of Rochester. 

There has been but limited use of census tract statistics in university-based social research. Urban 
sociologists have been the main consumers, in the research field generally designated as human ecology. 
Five patterns of research use in this field are identified. 

Certain ecological and statistical assumptions implicit in the research use of census tract material 
are examined: the “natural area” concept, the prospects for geners! theories of urban spatial patterning, 
the validity of tract data when used in a statistical index sense, the reliability of tract statistics when 
sampling is involved, limitations to be recognized in interpreting ecological correlations, and the static 
framework in which most tract statistics have been cast. 

It is recommended that in future social science research, (1) more social scientists (especially non- 
sociologists) be encouraged to use tract statistics, (2) tract data may be most effectively used in the 
spirit of providing rough ecological profiles, (3) the use of tract statistics be integrated with other re- 
search approaches, (4) ecological correlations be used only when relating areal characteristics (and are 
not a substitute for individual correlations), (5) ingenuity is needed in introducing new types and forms 
of tract information, and (6) within each large city there is continuing need for key researchers to foster 
cooperative use of the census tract reporting system. 


Current Developments and Problems in Connection with the Census Tract Program. Roy V. Pzet, 

Bureau of the Census. 

The Census Bureau participated in the Census tract program primarily through establishing 
standards and through making tabulations. However, we have now come to the conclusion that a 
system of small areas with fixed and well-defined boundaries should be established through the exten- 
sion of the census tract program. These would assist in establishing stable administrative units for tak- 
ing censuses and would enable the publication of data for such stable areas as local needs justify and ap- 
propriations permit. A related development is the establishment for the 1950 Census of census county 
divisions as relatively permanent units for presenting statistics for the State of Washington. Discussions 
are going on in a number of other States where minor civil divisions do not have stable boundaries to 
explore the possibility of delimiting similar areas. During the last decade, marketing groups took the 
initiative in acquiring and presenting retail trade data and data from other sources by tracts or groups 
of tracts and demonstrated the utility of tracts for marketing uses. As a result, the Bureau of the Census 
is collaborating in the development of groups of census tracts into retail trade areas for presentation 
of data. 


What an Economist Wants to Know in the Way of Saving Data. Gzrnarp Couto, National Planning 

Association. 

Balanced economic growth requires that demand for goods and services increase roughly in pro- 
portion with productive capacity. The economist concerned with economic growth is interested in saving 
as one of the factors limiting demand. Since future consumption is customarily projected by deducting 
an estimate of saving from disposable income, we need to know what is the “normal” rate of saving which 
should be assumed from a given disposable income. 

Some studies seem to indicate that the rate of saving, in the long run, fluctuates around a rather 
stable trend line. Others, particularly family budget studies, reveal a persistent tendency for saving 
to rise with rising incomes. Although we now have a wealth of saving statistics, they do not permit 
& conclusive answer to the specific and vitally important question: Is the recent rise in the saving ratio 
due mainly to extraordinary factors and hence to be discounted in projections of the future saving ratio, 
or does it largely reflect the fact that incomes have gone up and hence indicate a still higher saving ratio 
in the future if incomes continue to rise. 

Questions of this type may require the collection of additional statistics but they particularly re- 





624 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


quire more fruitful methods of organizing the statistics. Some of the breakdowns and analyses of saving 
data which would be helpful in projecting the saving ratio are: 

(1) Breakdowns of total saving and types of saving by income classes, occupations, and geographic 
areas, including separate breakdowns of positive saving and of dissaving. (2) a breakdown of positiys 
savings among those which are (a) identical with investment (such as farmers’ changes in inventories or 
direct investment by noncorporate business), (b) earmarked for a specific purpose (life insurance), and 
(c) held in liquid form, (3) A breakdown of dissaving among (a) borrowing for the acquisition of an 
asset, particularly a consumer durable good, (b) borrowing for meeting extraordinary expenses (doctors’ 
bills), and (c) borrowing to bridge over a period of losses in incomes because of unemployment or other 
reasons. (4) Analysis of the extent to which the apparent long run stability of the saving ratio is due 
to a redistribution of income—a “reweighting”’ of the income structure. In this connection the possibility 
should be investigated that a rise in incomes of persons at the lower end of the income scale typically 
leads to dissaving as these persons for the first time become able to go into debt for automobiles and 
household appliances. (5) Analysis of the relation of saving to changes in income at various income 
levels. (6) Analysis of the effects of the holding of savings (including pension rights) on current saving 
and dissaving. (7) Analysis of the extent to which the saving habits of people in the lower brackets are 
influenced by the saving habits of people in the higher brackets. 


The Next Steps in the Statistical Study of Saving. Rarmonp W. Go.psmiTs. 


The paper briefly discusses the following twelve steps: full integration of estimates of saving intos 
system of social accounts, correlation of estimates from balance sheet and income account, and from 
aggregate and sample data, more detailed explanation of sources and methods of estimates; provision 
of variant estimates, such as cash aaving and saving following the business accounting rather than 
social accounting concepts, and saving using replacement cost and curvilinear depreciation instead of 
original cost straight line depreciation; enlarged scope of estimates, to cover in particular government 
saving, saving through consumer durables, military assets and land improvement; finer breakdowns by 
saver groups; finer breakdowns by forms of saving; less “netness” in estimates; appraisal of margin of 
errors in estimates; appraisal of motivational significay:ce of estimates; tie-in of data on current saving; 
cumulated (life) saving; and wealth; tie-in of quarterly or monthly with annual data. 


The Use of Laboratory Experiments in Teaching Probability Statistics. A.C. Rosanper, George Wash- 
ington University. 

Probability statistics is both inductive and deductive in nature. There is an urgent need to clarify 
the presentation of this science to laymen, specialists, and management, in order that its true meaning 
and versatile power be understood. The deductive mathematical approach is necessary but not sufficient; 
explanations are obscure and realities are ignored. The inductive laboratory approach is needed in 
order to explain and vitalize the basic principles of probability statistics and their applications. 

The laboratory approach to probability statistics bridges the gap between theory and practice, 
and stresses the scientific approach to problems and aspects of problems in various fields including 
management. 

Furthermore the traditional materials of demonstrating probability—coins, dice, cards and wheels, 
—need to be supplemented by new devices and apparatus in order to demonstrate important new 
principles and techniques in sampling, estimation, experimentation, and inference. Four such new 
devices are described in detail, devices which are designed to demonstrate the principles of subsampling, 
group sampling units, problems of sampling design, and the analysis of variance. 

In order to obtain the maximum return from the inductive approach to probability statistics, a 
formal set of 45 experiments are listed and the materials required to perform them are itemized. These 
experiments, many of which have already been tested in the classroom or on the job, are organized to 
parallel a systematic development of the subject. 

A seven-step procedure is outlined for each experiment and a regular laboratory manual recom- 
mended to include the entire 45 experiments. The aim of this manual is many-fold: to guide not only 
the student but the teacher in charge of such a laboratory course, to stress the understanding of basic 
principles, to show how to apply these principles to real problems, to show how to record and to process 
sample data, and to show how to interpret them. 

Significant problems encountered, and tentative conclusions reached, based upon experience to 
date with these experiments, are described briefly. Several references to experiments in probability 
statistics are also listed. 


Multiple Comparisons. Joun Tuxey, Princeton University. 


The most useful type of statistically-based statement is of the form “so-and-so is equal to such-and- 
such within thus-and-such a margin of error.” All statisticians realize that any such statement, no 





ER 1983 
of saving 


eographic 
f positive 
ntories or 
nce), and 
ion of an 
| (doctors’ 
, or other 
tio is due 
ossibility 
typically 
)biles and 
18 income 
nt saving 
.ckets are 


ing intos 
and from 
provision 
sher than 
nstead of 
vernment 
Jowns by 
nargin of 
it saving; 


ge Wash- 


to clarify 
meaning 
ufficient; 
eeded in 
8. 

practice, 


ncluding 


1 wheels, 
ant new 
uch new 
ampling, 


tistics, a 
d. These 
:nized to 


1 recom- 
not only 
of basic 
) process 


‘ience to 
»bability 


ich-and- 
nent, no 


SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 625 


matter how wide the margin, is liable to error. In the best-regulated situations we can control therate 
at which errors are made (as in simple confidence limit situations). 

If we have a number of determinations (measurements, 24, of certain long-run values, or determin- 
ands, #;) which we wish to treat as having a common variance o* (of which we have an unbiased estimate, 
#, whose stability corresponds to a given number of degrees of freedom, DDF) then we are faced with 
a problem of parallel determinations or multiple comparisons, or both. Upon investigation, a number of 
different and precisely definable types of error rate arise, including: per determination, per batch, 
batchwise, per comparison, per family and familywise. An error rate of p% familywise, for example, 
means that in (100—p)% of all families analyzed, all the comparative statements among the various 
determinations are within their indicated margin of error. 

The numerical details of setting such margins for error, termed allowances, for an error rate of 5% 
familywise is discussed and the necessary tables given. It is computationally convenient to calculate 
first the relatively familiar least significant difference or LSD, which equals s\/2 times the 5% point of 
Student’s |¢| . It is then possible to find the allowance appropriate for simple comparisons of the form 
%—2j to an adequate approximation as 


Bm 
WSD = (4 +=) (LSD) 
DDF 


where Am and By, depend only on the number, m, of determinations in the family. (Tables for 1sm320 
are given.) 

The allowance for any linear combination Dears is the norm of {cg} times the WSD, where this 
norm is the sum of the positive c;, or minus the sum of the negative cs, whichever is larger. 

Numerical examples are discussed, and generalizations and extensions are mentioned. 


Experimental Designs for the Physical Sciences. W. J. Youpren, National Bureau of Standards. 


In experimental design a block consists of a number of “treatments,” “varieties” or items grouped 
together in the experimental program. Intrablock comparisons are more precise than comparisons in- 
volving two blocks. For many physical experiments the block size is sharply fixed and frequently ac- 
commodates only two or three items. The high precision of physical measurements makes it unnecessary 
to use many replications. These conditions favor the use of partially balanced designs. Examples given 
Illustrate the use of partially balanced designs for blocks of two. 


Some Potential Contributions of Mathematics to Social and Economic Statistics. Freperick F 

SrepHan, Princeton University. 

American statisticians have attempted to meet a growing demand for statistical information for a 
century or more. They have not centralized the collection of economic and social statistics but they 
have sought improvements in accuracy and dependability whenever such data are assembled for general 
use. The accumulation of experience and know-how has led not only to higher quality and wider use but 
toa still greater demand for accuracy. This need for accuracy and for more statistical material cannot be 
met in the future merely by experience; some new methods based on mathematics and certain branches 
of applied science are required. 

Mathematics can contribute a precise and powerful language, an instrument of analysis, a vehicle 
for importing useful results from other sciences, and a basis for a systematic theory of the production 
and use of statistical information. Examples of its contributions can be found in counting, classification, 
calibration, measurement, time series, and various other aspects of statistics. R. W. Burgess’ advice to 
the statistical forecaster can be extended by adding practical admonitions to the statistician who tries 
to put mathematics to work on these problems. 


The New Electronic Machines and the Future of Statistics. Curasert C. Hurp, International Business 

Machines Corporation. 

The contributions of statisticians to the development of automatic data processing machines are 
discussed. These include large problems such as furnishod by the U. 8. Census, as well as the training 
of personnel and the writing of procedures for data processing installations. Problems to be solved are 
divided into two general categories, one having little data and a large amount of processing, the other 
having a great deal of data and a relatively small amount of processing. Electronic developments for 
compact, rapid access storage and high speed computing circuits and their relation to stored program 
operation are discussed. The application of new machines to statistical problems is described in con- 
nection with problems of recording, editing, and error detecting. 











626 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Some Statistical Problems in Field Geology. Howarp J. Pincus, Ohio State University. 


Many geological studies include data describing orientations in 2 or 3 dimensions. Analyses arg 
typically directed toward drawing inferences regarding direction and intensity of factors such as de- 
forming forces and depositional agents. 3-dimensional field studies most commonly use strike (the direc- 
tion with respect to true north of the line of intersection between the given plane surface and the local 
horizontal), and the dip (the given plane’s inclination, measured from the horizontal in a vertical plane 
normal to the strike). The range of strike and of dip is 180°. 

Problems of sampling and measurement are often complicated by the high order of variability of 
materials and the paucity of suitable data. 

In studies of rock fractures, shear sets produce bimodal distributions with interdependent con- 
centrations. Problems such as determining sample size, establishing sampling schemes, and estimating 
mean dihedral angles between pairs of planes must await the application of adequate distributions of 
periodic variates. 

Graphic methods have been used for evaluating modes and for simple analyses of both 2- and 
3-dimensional data. 

Using as models “uniform” or Poisson distributions (as plotted on polar, rectangular, and spherical 
systems), observed data have been compared to the models with chi-square and other tests. 

Circular normal theory appears to be applicable to some of the orientation problems encountered 
Applications are to be presented in detail in the literature. Use of distribution functions provides con- 
siderably more information than merely establishing “significant” departure from arbitrary standards 
of uniformity. 


Applications of Statistical Methods to Sedimentary Rocks. W. C. Krumpgin, Northwestern University. 


Statistical methods find wide application in geology, especially in the study of textures, structures, 
and composition of sedimentary rocks. Certain apparent irregularities in the data, such as highly 
skewed distributions, use of weight instead of number frequencies, use of unequal class intervals, and 
some others, required development of special methods of statistical analysis. In part, use of logarithmic 
transformations and other devices permitted application of conventional methods to the data. Some 
sedimentary attributes approach Gaussian distributions with no complicating factors. Mineral compo- 
sition data are commonly binomial or Poisson distributions. 

The present paper sketches the development of statistical thinking in sedimentation and includes 
a discussion of some geological problems that can be attacked statistically. The discussion is extended to 
include problems of sampling, relations between sample and population, questions of areal variation in 
sediment properties, design of experiments, and other aspects in need of continuing statistical analysis. 


Climatology’s Needs in Statistical Research. ARNOLD Court. 


Climatology as a separate science began early in the 19th century, and its development has roughly 
paralleled that of statistics. Many statistical techniques were applied to climatic problems as soon as 
available, and some even were developed for climatology; conversely, certain graphic statistical tech- 
niques, such as the polar diagram (L. von Buch, 1818) and isopleth diagram (L. Lallanne, 1846) origi- 
nated in climatic representation. However, despite occasional flurries of interest (H. Meyer 1891; C. F. 
Marvin et al. 1915-22), use of statistical methods in climatology has not kept pace with that in other 
fields. 

Climatic analyses fall into two distinct classes: geographical, general, or descriptive, and engineer- 
ing, specific, or predictive. Geographical studies seek to describe all climatic features of one place or area, 
and thus concentrate on the central portions of the frequency distributions of the various elements; 
engineering studies assess the probable frequencies of desirable or harmful occurrences, usually the 
extremes of one or two elements, separately or in combination. 

Proper statistical description and analysis of climatic data are difficult: almost no elements have 
normal! distributions, very few sets of data are either independent or of constant variance, and few sets 
can be considered as truly random samples from a definable population. All climatic elements are corre- 
lated both in space and in time, and evidence is accumulating that for no element can the expectation be 
safely assumed as constant over the past century—the approximate maximum length of record. 

Adequate statistical techniques are needed for several specific problems. As an absolute measure of 
variability (for such elements as rainfall) the coefficient of variation is used extensively, but its sampling 
distribution for skewed climatic data is unknown. Ordinary procedures in regression and analysis of 
variance assume constancy of variance; modifications are needed for data in which variance is definitely 
not uniform, although the manner of its variation is unknown. Moving averages (“running means”) are 
used in many studies, especially those of climatic fluctuations, but their characteristics are not com- 
pletely known; in particular, there is no satisfactory method for constructing a confidence band about 
such averages. Finally, more attention must be given to the problem of analyzing dependent data in 
which dependence decreases rapidly (in time or space), as do most data of climatology. 





1953 


e8 are 
18 de- 
direc. 
local 
plane 


ity of 


-con- 
ating 
ns of 


- and 
erical 


ered 
| COh- 


dards 


rsity, 
ures, 
ighly 
, and 
hmic 
Some 
mpo- 


ludes 
ed to 
on in 
ysis. 


gbly 
Mm as 
ech- 
Tigi- 
. F. 
ther 


eer- 


nts; 
the 


lave 
sets 
rre- 
n be 


e of 
ling 
s of 
tely 
are 
om- 
out 
» in 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 627 


Estimates of Labor Force, Employment, and Unemployment, 1900-1950. Srantey Lesergort, Bureau 
of the Budget. 

Estimates of labor force, employment and unemployment were presented for the years 1900-1950. 
Totals for each of these series are intended to be comparable with the current monthly estimates of the 
Census Bureau. Estimates of employment by major industry group, and for a variety of minor groups. 
are comparable with the current series of the Bureau of Labor Statistics. 

These series differ from earlier estimates for a variety of reasons. They are comparable with the 
current official series; they draw upon data which have become available only in recent years; they rest 
on an evaluation of statistical relationships in the entire half century; and they give more explicit al- 
lowance for the possible impacts of cyclical changes and wartime production on employment than some 
earlier estimates could. 


Statistical Problems Encountered in the Work of the Commission on Financing of Hospital Care. 

Ist1porE ALTMAN AND Rosert M. Siemonp. 

The Commission on Financing of Hospital Care, created to study “the costs of providing adequate 
hospital services and the determination of the best systems of payment for such services,” is carrying 
on a number of studies in the fields of hospital fiscal problems, physician-hospital relationship, prepay- 
ment for hospital care, and financing of hospital care for low-income groups. All the studies have sta- 
tistical aspects. Three of these—coverage of the population by hospital insurance, the characteristics 
of hospitals with high and low operating costs, and the attitude of the American people toward hospital 
insurance techniques are involved. Problems concerning choice of procedure, construction of question- 
naires, most appropriate sources of information, evaluation of available data, most fruitful investment 
of time and energies, etc., are discussed. The role of the statistician in the setting of a study commission 
is also discussed: to supervige statistical activities of the staff, to educate his co-workers to the careful 
use of statistical data, and to serve as “philosopher and statesman.” 


Some Problems in Determining Maxima of Functions of Several Variables. Freperick MostTe.ier. 

Harvard University. 

This paper discusses problems in determining extremals (maxima or minima) of functions of several 
continuous variables. The principa! innovation is to propose a way of measuring the “togetherness” of 
the large values of the function, This measure provides some idea of whether a function will be amenable 
to sequential techniques like steepest ascent, whether random drawing will do nearly as well, or whether 
some change in the coordinate system would be desirable. 

Some of the advantages of random drawing of points are discussed, and the results of random 
drawings are compared with sequential techniques; a modification of the random technique that makes 
it a sequential technique is described. Effects of errors of measurement are discussed briefly. - 


Recent Advances in Finding Best Operating Conditions. R. L. AnpERSoN, North Carolina State College. 
This paper discusses various experimental procedures used to estimate the optimum point on a re- 
sponse surface, : 


y = o(fi, fr, ooo, fy, 


where y is the response and f; the amount of the ith factor used in producing y. Multi-factor experiments 
were first set up to investigate one factor at a time; then Fisher (1935) and Yates (1935, 1937) introduced 
the complete factorials for field experiments, plus confounded arrangements for incomplete blocks de- 
signs. More recently, fractional replication designs have been introduced in order to cut down the size 
of the experiment; see for example, Kempthorne (1952). 

Hotelling (1941) derived methods of locating the optimal point using a single factor. Friedman and 
Savage (1947) outline a sequential one-factor-at-a-time experimental plan when several factors are in- 
volved. 

Box and Wilson (1951) present a method to determine the vicinity of the optimum by use of the 
“path of steepest ascent.” They determine this path from preliminary experiments, assumed far enough 
removed from the optimum so that the response is essentially planar. When one approaches the opti- 
mum, new experiments are used to estimate quadratic and interaction effects. On the basis of a series 
of experiments using the Box-Wilson methods, we concluded: (i) The experimental error must be small. 
(ii) The experimenter must know enough about the response surface so that the nature of the reaction 
does not change as he proceeds from the starting point to the optimum. (iii) He must be able to start 
with factor levels spaced far enough apart to indicate linear effects if they do exist, and still not have 
important interaction and quadratic effects. 





628 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


City and Area Statistics—Chamber of Commerce Experience. Lronarp A. Draxr, Chamber of Com- 
merce of Greater Philadelphia. 


There is a great volume of local statistics available to agencies willing to spend the time and funds, 
So much is available to me in Philadelphia that, with a limited staff we must forego the application of 
advanced statistical procedures to the raw data. 

There is very seldom need for anything beyond say a moving average for smoothing purposes, 
elementary sampling techniques, and an occasional seasonal index. Of utmost importance is common 
sense plus experience in handling the wealth of local data. For example, What data is important and what 
does one discard; which statistics are of doubtful accuracy and to what degree; what are the proper 
methods of tabular presentation; how may the data best be illustrated, by diagram or map? 

One of the most interesting fields of our statistical work is the projection of population trends— 
total, Negro, school, by age groups, by city subdivisions, and by counties. I have seen some highly re- 
fined statistical measures applied to population projection at both Philadelphia and national levels go 
haywire because of wrong basic assumptions. 

It is much more important to make informed “guesses” relative, for example, to the impact of 
earlier marriages and high birth rates in an era of full employment, or how much impact the area's 
new steel industry will have, than to work out rigid mathematical curves projecting historical data 
twenty or thirty years ahead. 

The use of local, business, and civic statistics, whether in forecasting, projection, or straightaway 
analysis, requires a maximum of common sense, experience, and imagination and a minimum application 
of text book tools of correlation and other refined statistical manipulations. 

There is still great need for educating the business community on the value of statistics and sta- 
tistical, economic, and market research. My method is to make all statistical reports as graphic as 
possible; and in this connection, I seldom use logarithmic scales and take great care to avoid too many 
variables in a diagram. Advanced statistics are taboo in Chamber of Commerce work on three counts: 

a. Our public doesn’t understand. b. Most of the available raw data does not warrant such treat- 
ment. c. With limited staff, there is no time. 


Notes on the Revision of the Consumer Price Index. Epwarp D. HoLtuanper, Bureau of Labor Sta- 
tistics. 

The Bureau of Labor Statistics has completed the first revision of its Consumer Price Index in 15 
years. The revised index is essentially unchanged in purpose, design and in most aspects of measure- 
ment. It is designed primarily as a price deflator of wage income. The Consumer Price Indexes are 
designed as Laspeyres indexes, but in practice, fixed-weighted indexes cannot very long be maintained. 
There is theoretical objection to an index which assumes complete inelasticity of demand through the 
ranges of prices and income situations. 

From a purely theoretical point of view, the purposes of a deflator over time are served by an index 
of the cumulative effect of price changes on the purchasing power of income in two situations, in which 
the products of price changes times quantity (real income) changes equal the changes in money income. 
Such an index, integrating the interactions of price and quantity changes along the historical price- 
quantity path, is equivalent to a series of fixed-weighted indexes in which price and quantity changes are 
continuous. The chain index, as an approximation to the integral index is also operationally efficient 
and flexible for both producers and users of the indexes. 

As the measure of the price component of a change in money income, the index must be based on 
concepts which clearly differentiate between price changes and all quality-quantity changes that de- 
scribe changes in real income. The former must be built into the pricing diagram; the latter into the 
weights. 

Population characteristics and the standard of living are part of the weights. Changes in expendi- 
tures arising from changes in these characteristics are treated in ways that do not affect the level of the 
index. Economic logic of the index formulation requires that weights change with the manner and level 
of living, because the index is designed as a deflator of wages primarily. For the purpose of the index, 
“expenditure” is defined to exclude any effect of saving. Income is defined as income after personal taxes; 
and income taxes are thus excluded from the weighting and pricing diagram of the index. 

The operational design of the index is a system of stratification and clustering to make the collection 
of data and calculation of indexes as efficient, both statistically and administratively, as possible, with a 
minimal sacrifice of precision. The index includes 46 cities, stratified by size, climate, income and density, 
to represent the entire urban population. The sample of items of the universe of the transactions in 
pricing were selected to “represent,” in a judgment sense, all the important classes of goods and services 
that enter into the level of living. Because the principal source of variability is the variability of price 
change among items, a number of important items formerly imputed are being priced for the revised 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 629 


index. The variability among “outlets” (retail stores and service establishments) appears to be some- 
what less than the variability among items. 

A distinction is made in the revised index between the monthly and annual indexes. The annual 
indexes, calculated with seasonally varying quantity weights and incorporating certain benchmark 
corrections, will be the most precise measures of year-to-year price movement. To the extent that the 
indexes can approximate the price effect of continuous price-quantity changes, they should provide 
suitable deflators of wage and salary income, and better measures than we have had previously of the 
long-term trends in real incomes and standards of living. 


Reliability of Soviet Industrial and National Income Statistics. ALEXANDER GERSCHENKRON, Harvard 

University. 

The deficiencies of Soviet economic statistics stem from a variety of sources: 1) The economic back- 
wardness of a country with but a brief tradition of mass literacy. 2) The institutional setting of the Five 
Year Plans and the character of the Soviet economy as a “deficit economy” which induce managers of 
industrial enterprises to falsify production reports either in order to give the impression of better results 
than those actually attained or in order to hide actually produced output for purposes of various illicit 
transactions. 3) Distortion of statistics by the central authorities in the interest of propaganda as a rule 
resulting in overstatements of the data on industrial output and national income in general. 4) The 
very small volume of statistical information published. 

These deficlencies severely limit the reliability of Soviet industrial and national income statistics. 
Asa rule, it has been impossible to make any significant corrections for deficiencies listed under (1) and 
(2) above. On the other hand, western scholars have in the past suceeded in many instances in penetrat- 
ing the propaganda veil spread by the central authorities and in reaching some significant conclusions. 
This was possible because in general such figures as were given did not represent sheer inventions, but 
had some meaning and significance which made it possible for critical analysis to uncover the distortions 
and to attempt corrections. It is unknown whether the same opportunities for critical research will 
exist in the future. The temptations of the cold war may well induce the Soviet authorities to resort to 
publication of data which will be based on nothing but sheer inventions. The extreme paucity of present 
statistical information would facilitate such a course because it could be pursued without much fear of 
obvious inconsistencies and contradictions. 


Reliability and Usability of Soviet Statistics: A Summary Appraisal. Askam Berason, Columbia Uni- 
versity. 

For Western students of the Soviet economy, an initial difficulty in the way of serious research so far 
as Soviet statistical data are concerned arises from the Soviet government’s policy of withholding in- 
formation. This policy is not new, but in the course of time the government has become progressively 
more secretive. For some years the government has been withholding statistical data not only on matters 
of immediate military concern, but also on the economy generally. 

As to the statistics that are published, their quality is affected adversely, though to a degree which 
is often conjectural, by a variety of features: falsification and inefficiency in reporting of raw data by 
lower echelons; deficier.cies in the collection, processing and publication of data by the higher echelons. 
The effect of these limitations most often is to give an unduly favorable impression of the Soviet econ- 
omy, but there are reasons to think, nevertheless, that the higher echelons do not generally resort to 
falsification in the sense of free invention and double bookkeeping. Accordingly there is at least a core 
of fact in Soviet statistics and much of the research being pursued today by Western scholars rests ulti- 
mately on the supposition that this is so. The evidence for this supposition, however, is not now as im- 
pelling as it once was. Accordingly, this notion has to be constantly reviewed. 


The Nature of Soviet Population and Vital Statistics. Franx Lormenr. 

Questions concerning the reliability of Soviet statistics during the 1920's are purely technical 
Data on such items as age and mortality were seriously affected by inaccurate or incomplete reporting, 
but the 1926 census data were presented in complete detail, and rapid progress was made in the im- 
provement of vital statistics. 

After about 1930, progress in demographic statistics—in publication and also in technical reliability 
of information available to the government—was eclipsed by a spreading cloud of anxiety and secrecy. 
The first clear indication of an ominous trend in official information on the Soviet population was the 
suppression of regular detailed reports on vital statistics. There can be no doubt that this drastic action 
was motivated chiefly by anxiety to conceal the excess death of several million people during the forced 
collectivization of agriculture and associated disorders (as shown by intensive analysis of later official 
information). It is also true that civil registration was for a time seriously disordered by these calamities. 











630 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Estimates published in the Second Five-Year Plan show gross errors in the estimates of current 
population, as well as in population projections. Possible reasons for such error are discussed. It is 
probable that the discrepancy between the expected number and that indicated by returns from the 
1937 census was largely responsible for the suppression of the latter and the purge of officials in charge 
of its administration. 

The treatment of the 1939 census reveals the anxiety of Soviet officials to select items of information 
deemed “fit” for publication. The peculiar device of publishing information on age only by broad classes 
without sex, and on sex only without age, was obviously designed to conceal irregularities in age compo- 
sition due to catastrophic events in the early 1930's, and the shortage of adult males which had become 
even more acute than in 1926. Data for political divisions were never published; such information would 
in some cases have revealed very abnormal conditions. Nevertheless, basic data on population available 
to the government at this time must have reached a fairly high level of accuracy. This situation was 
soon disrupted by war. 

New techniques involving quick enumerations, especially in cities, and registration procedures, 
especially in rural areas, were developed during the war and post-war period. Secret data at the disposal 
of the government obtained in this way is being gradually extended and controlled with respect to 
quality, so that such information may now approach tolerable accuracy. 

In conclusion, emphasis is placed on distinctions between the three rather distinct, though related, 
questions of “reportorial fidelity,” “fidelity to science,” and “technical accuracy.” It is assumed that 
official technical releases have, up to the present time, generally respected the first of these principles, 
but since the early years of the regime, have ignored the second. The third of these problems requires 
greater emphasis than it has sometimes received in the use of partial information on the Soviet popula- 
tion by western scholars. 


Agricultural Statistics in Soviet Russia: Their Usability and Reliability. Lazar Vouin. 


Agricultural statistics are important in Russia because of the importance of the harvest to the 
well-being of the Russian people, as well as to the economic program of both the Tsarist and Soviet 
governments; and, because of the crucial role played on the Russian socio-economic scene by the agrarian 
problem. 

Even before the Second World War significant agricultural information was not as abundant as in 
the 1920's, and, since the war, little has been published. The reliability of the figures is often inferior to 
their quantity. 

Unqualified acceptance of official crop yields and production statistics is opposed. These figures are 
preharvest estimates of the standing crop, which do not take into account the large harvesting losses 
common in the USSR, and lend themselves to over-estimation. 


Measurement of Unconscious Attitudes in the Evaluation of Counseling. Frep E. Fiepuer. 


The paper indicates a number of problems in evaluation of therapy. A theory is discussed which 
holds that the therapist’s own unconscious attitudes facilitate or inhibit the patient’s ability to express 
his feelings. This requires that we measure the therapist’s unconscious feelings toward the patient. 

A method is presented which presumably measures such attitudes on the part of the therapist. It is 
based on similarity between persons’ measurement such as Q-technique, but differs from usual measures 
in comparing the subject's self-description with the subject’s prediction of another S sclf-description. 
Assumed similarity scores are then obtained from these comparisons. The meaning of these measures, 
and some problems inherent in this measurement technique are discussed. 


A Comparison of Two Different Methods of Caseworkers’ Judgments of Movement. Cuar.ies Gersu- 
ENSON, Jewish Children’s Bureau of Chicago. 


In making professional judgments as to movement (deterioration-improvement) of children in 
foster home placements, which of two different rating procedures is more reliable and which is more 
feasible? A group of 19 caseworkers was divided into two samples of 9 and 10 workers. Each worker was 
given a set of 25 summaries of case records describing the movement of children in foster home place- 
ments. A different set of 25 case summaries was presented to each group. The workers were asked to rate 
each case using a 9-point scale ranging from great deterioration to great improvement. After the cases 
were so rated, the workers arranged them in rank order from the case showing the most improvement 
to the case showing the most deterioration. 

Pearsonian correlation coefficients were computed for the 9-point scale and rank correlation coeffi- 
cients for the ordered data. For the group of ten workers the mean reliability of the 9-point rating scale 
was .76 and the corresponding mean reliability for the ranking method was .76. The nine workers of 
the second group had a mean reliability for the rating method of .76 and .79 for the ranking method. 
A comparison of the mean reliability for each of the caseworkers showed a maximum difference for one 








. 1953 


urrent 
. It is 
m the 
harge 


ration 
‘lasses 
>mpo- 
come 
would 
‘ilable 
n was 


dures, 
sposal 
ect to 


lated, 
i that 
ciples, 
quires 
)pula- 


© the 
soviet 
arian 


asin 
ior to 


es are 
losses 


which 


_ It is 


»tion. 


en in 
more 
r was 
lace- 
> rate 


ment 


oe ffi- 
scale 
rs of 
thod. 
r one 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 631 


judge of .65 for rating and .80 for the ranking procedure. The remaining judges showed no difference 
greater than .04 between the two reliabilities. 

The caseworkers indicated that some prefer one method in comparison to the other and there was 
no general consensus favoring any one procedure. 


The Radiocarbon Calendar. W. F. Lissy, University of Chicago. 

The use of natural radioactivities to measure geologic time dates back to the beginning of this 
century. The occurrence of radioactive carbon 14 in nature, due to the action of cosmic rays, provides us 
with an accurate calendar of man’s past, since its half-life of 5600 years coincides with the span of historic 
time. New measurement techniques, using the screen wall counter, have been developed to achieve the 
necessary sensitivity for this very weak activity. These techniques are described, together with some of 
the results already achieved with them. 


Ocean Surface Waves: An Analysis of Their Appearance, Propagation, and Properties in Terms of 

Power Spectra, Stationary Time Series and Statistics. WitLarp J. Pierson, Jr. 

A combination of classical hydrodynamics and time series theory is shown to give many observable 
properties of ocean waves. The combination permits adequate statistical descriptions of the waves and 
yields methods which permit the waves to be forecasted both in the storm area and after they have dis- 
persed out of the storm area. The waves are shown to be a quasi-homogeneous Gaussian process. 

A summary of the methods of classical hydrodynamics is given, and it is shown that the classical 
theory does not go far enough in a statistical and practical sense. The methods used by the geophysicist, 
based on averages, are summarized and shown to be inadequate. The early search for the spectrum of 
the waves is reviewed. Then the application of time series theory to defining a realistic power spectrum 
and adequate statistical parameters is given. Finally the results of classical hydrodynamics and time 
series methods are synthesized to obtain results of practical and theoretical usefulness. 


Probabilistic Study of Clustering of Galaxies in a Static and in an Expanding Universe. Jenzy NevmMan 
AND Exizasets L. Scott, University of California (Berkeley). 
This paper reports on a study of the distribution of the galaxies conducted at the University of 
California. Expositions of the various stages of the study have been published from time to time, e.g., 
Astrophysical Journal, 1952 and 1953. 


A Factorial Design Applied to a Specific Chemical Process and Development Problem. H. Gronskopr 

AND F, Witcoxon, American Cyanamid Company. 

The task of evaluating a chemical pilot reactor capable of processing 120 pounds of raw materials 
per hour was solved by the use of a design for four factors at two levels. Two replications consisting of 
4 four-trial blocks were run. 

An organic liquid and a gaseous feed stream were reacted in the presence of a sulfuric acid catalyst. 
The four scalar factors were: concentration of gas feed, reaction time, reactor pressure, and reactor 
temperature. A fifth variable was introduced by operating with one gas feed nozzle arrangement for 
one replication, and a different arrangement for the second replication. 

Differences due to replications, and therefore due to gas feed arrangements, proved negligible. The 
most important main effect was due to raw material concentration and the important interaction was a 
concentration-reaction time interaction. 

The experiment furnished reliable data for further plant design work and gave an efficient guide to 
best operating conditions. 


Non-Parametric Tests: Power Under Normality. W. J. Dixon. University of Oregon and University of 

North Carolina. 

The power efficiency function Eg is defined as the ratio of sample size of the t-test to the sample 
size of a test under question which have identical powers for a fixed alternative, 5. This function is de- 
scribed for the sign test for paired samples of size 5, 10, 20 and for samples of five or less for the two 
sample tests: rank sum, maximum absolute deviation, median, total number of runs. The normal 
alternatives considered differ in mean value. 


General Review of Non-Parametric Methods with Special Emphasis on Randomization Tests. Lincoun 
E. Moses, Stanford University. 
Non-parametric tests appropriate to experiments involving matched pairs are explored in some 
detail; Fisher’s randomization of the observations, the t-approximation to this, the normal approxima- 
tion, Wilcoxon’s signed rank test and the sign test are all considered and illustrated. The character of 





632 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


inference in this framework is touched upon. The problem of which test to use is posed. Various two- 
sample and k-sample tests are illustrated in less detail; among these are: randomization of the observa- 
tions, Wilcoxon-Mann-Whitney test, median tests, run test, analysis of variance by ranks (Friedman's 
and Wallis-Kruskal’s). 


Components of a Difference Between Two Rates. Evetyn M. Krraaawa, University of Chicago. 


This paper attempted a revised and systematic statement of the extent to which the difference 
between two rates can be accounted for by differences in the composition or structure of the two groups 
to which the rates refer. The difference between the over-all rates of two groups is separated into two 
major components, one due to differences in the composition and the second due to differences in specific 
rates of the two groups. The former major component is further subdivided into net subcomponents, 
each of which represents that part of the difference between the two over-all rates which is due to differ- 
ences in composition with respect to one factor independent of one or more other factors. For example, in 
a recent study of labor mobility, it was found that 65 per cent of the difference between crude mobility 
rates of Chicago and Los Angeles men was accounted for by differences in their age and migrant com- 
position; furthermore, the difference in migrant composition alone was responsible for virtually all of 
this 65 per cent reduction in mobility differentials. 


The Application of Mobility Research to Labor Supply Models. Davip Roserts, Carnegie Institute of 

Technology. 

One of the more important factors giving rise to the current interest in constructing labor supply 
models is their relationship to the input-output analysis. That technique yields the dollar outputs of 
each of 192 industries, implied by any given bill of final goods. To carry on from there it is necessary to 
translate dollar outputs into labor requirements and to balance the latter with labor supply in terms of 
area, industry and occupation. Knowledge of labor mobility, including labor force participation in that 
term, is essential in order to pass from the known distribution of the labor force to that which may be 
expected under the postulated conditions. After the population has been projected by age, sex, etc. 
groups, the labor force participation of these groups must be estimated. Less familiar problems arise 
with the attempt to project the distribution of the labor force by area, industry and occupation. 

The question of the stability overtime of mobility patterns and rates is of obvious importance here. 
Many unresolved issues fall in this area. What occupational groups should be set up? Apart from the 
requirement of intra-class homogeneity and inter-class heterogeneity there is the question of potential 
job dilution and relaxation of efficiency standards which would alter class lines and mobility factors 
based upon them. There is also the question of what determines the direction which movement takes. 
Is it accidental factors such as proximity to plants, tips from friends, etc. or do many pepole have 
career patterns which they follow, etc. These and other problems must be explored further before it will 
be possible to set up the type of labor supply models envisaged. In the meantime, models of the same 
type but necessarily less accurate and detailed can be constructed using data such as the 1950 census 
and the six-city mobility study which are now becoming available. 


On The Teaching of Statistics: Non-Parametric Methods in the Elementary Statistics Course. Rauru 

Brav.ey, Virginia Agricultural Experiment Station of the Virginia Polytechnic Institute. 

An introductory section is devoted to definition of non-parametric statistics and a general con- 
sideration of the teaching of statistics. It is suggested that the major points for discussion of non-para- 
metric methods in the elementary statistics course may be covered by referring to the basic course, the 
methods course, and the introductory course in the theory of statistics. 

Radical changes in the teaching of elementary statistics are not recommended. Suggested reasons 
for the inclusion of work on non-parametric methods are (i) To provide easier or clearer means of illus- 
trating basic principles, (ii) To add interest in those places where non-parametric methods are applica- 
tions of fundamental distribution theory and distributions, (iii) To provide material that serves a basic 
need for various curricula, and (iv) To provide appropriate alternatives to standard tests for situations 
wherein the standard methods may be invalid. The inclusion of specific non-parametric methods is 
discussed for each type of elementary course. An extensive bibliography is included with the paper. It is 
a fairly complete list of references on the teaching of statistics and related topics. 


Probabilistic Theory of Neural and Social Phenomena. Anatot Rapoport, University of Chicago. 


The opread of excitation in a neural net is computed probabilistically on the basis of equiprobable 
direct connections between any two neurons in the net, where each neuron is assumed to fire whenever 
the number of excitatory stimuli impinging upon it simultaneously sufficiently exceeds the number of 
inhibitory stimuli. The steady state of excitation in the net is derived as a function of the input intensity 
and the net parameters. 

The problem of excitation spread is formally equivalent to a problem treating the spread of a state 





tute of 


upply 
ute of 
ary to 
‘ms of 
n that 
ay be 
r, etc, 
} arise 


» here. 
m the 
ential 
actors 
takes. 
have 
it will 
same 
ensus 


RALPH 


1 con- 
‘para- 
e, the 


A80NS 
‘illus 
plica- 

basic 
ations 
ods is 
. It is 


jo. 


bable 
never 
ber of 
ynsity 


state 


SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 633 


(uch as information) through a population. The computed time course of such a spread is computed 
with experimental data on measage diffusions through school children populations. It is shown how the 
departures of the observed values from the predicted can be accounted for by imposing a “structure” on 
the net of contacts initially supposed to be random. This leads to a theory of population structure in 
terms of existing contacts and suggests modifications of the original theory. 

The implications of these structural considerations are discussed with reference to possible neural 
mechanisms responsible for the organization of behavior. 


Research Design of the Survey of Patterns and Factors in Mobility in Six Cities. Marncaret S. Gorpon, 

University of California, (Berkeley). 

Despite recent heightened interest in mobility research, many important questions in this field re- 
main unanswered, The 1951 Six-City Occupational Mobility Survey (New Haven, Philadelphia, Chi- 
cago, St. Paul, San Francisco, Los Angeles) was an important step in providing more comprehensive 
data for analysis of labor mobility patterns and factors. Thestudy, attempted to answer the question: 
“Are there occupational, industrial, and regional differentials in mobility, of sufficient importance to 
affect manpower planning in a period of industrial mobilization?” While regional differences in mobility 
could not be directly analyzed, factors found to be responsible for inter-city differences may be regarded 
as clues to the probable nature of regional differences. 

Design of the study was characterized by four main features: (1) an enumerative-type interview 
with workers as the respondents; (2) a random sample of the entire labor force (excluding persons under 
25 years of age because of limited labor force experience); (3) analysis of civilian job changes on the 
basis of work histories during the period, 1940-1949; and (4) use of the Census Bureau’s occupational 
and industrial code. 

Findings are preliminary, since the analysis is still incomplete. Like other studies, this one showed 
that mobility rates vary inversely with age and that a minority of workers account for most job shifts. 
Mobility rates also tended to vary inversely with the position of workers in the occupational ladder but 
did not vary significantly among broad industry groups, except for the construction industry where 
workers changed jobs relatively frequently. Job shifts were likely to involve a change in occupation, in 
industry, or in both simultaneously, but workers in certain occupation groups (professional workers, 
female clerical workers, and skilled craftemen) were relatively unlikely to shift to other groups. While 
factors and patterns in mobility were strikingly similar in all six cities, average mobility rates varied 
considerably from city to city, with workers in Los Angeles and San Francisco displaying the greatest 
mobility. Differences in rates of in-migration were primarily responsible for these inter-city variations. 


Factors in Generation Occupation Mobility. ALBert J. Reiss, Jn., Vanderbilt University. 


This paper presents a statement and evaluation of a technique for the measurement of occupation 
mobility and applies it in the analysis of factors in generation occupation mobility in six American cities 
Occupation mobility is an index of the ease or difficulty with which individuals or groups acquire po- 
sitions or jobs open to competition in the labor force. Labor Force Demand Mobility is occupation move- 
ment due to changing demands of the occupation structure. If the size of the occupation groups changes 
over time, it follows that some intergenerational mobility occurs as men are recruited from declining 
occupations into expanding ones. Social Distance Mobility is occupation movement due solely to differ- 
ential evaluation of personal and social characteristics of workers. We need a measurement technique 
to distinguish between the two. The measurement technique is based upon the work of Goldhamer where 
social distance mobility is defined as the ratio between actual mobility and the amount of mobility we 
would expect if there is no relation between the sone’ occupational destination and their occupational 
origin. The denominator of Social Distance Mobility is the expected value in conventional contingency 
analysis. Expected mobility values therefore represent the amount of movement that occurs if only 
availability factors influence occupation movement. One unit is that amount of mobility expected were 
there no relation between fathers’ and sons’ occupational position. 

These ratios permit only a relative comparison of the amount of mobility between occupations, 
since their actual size is a function of the proportionate representation of that occupational class in the 
labor force, or, of the labor force demand factor. Attempts to construct an index based on these ratios 
which permits comparison between occupations have not been successful. This failure may be ascribed 
to a need to take into account change in the labor force demand factor between generations, as well as 
the size of the demand factor. 

The substantive question which the study analyzes is, to what extent do migration and educational 
attainment influence generation occupation mobility. The following conclusions were reached. (1) While 
migration, as compared with stability of residence, provides greater opportunity for persons to move 
within the broad occupational orientations of their fathers, it decreases their movement out of these 
broad occupational orientations. (2) Higher education provides roughly equal access to the non-manual 
occupations for men in all occupation groups while the absence of such education severely limits access. 











634 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


The Randomization Theory of Experimental Inference. Oscar Kemprnorne, Jowa State College. 


The paper opens with a discussion of the need for a precise description of the role of randomization 
in the theory of experimental inference. In certain cases for example, normal law theory is called upon 
for the making of probability statements, while in other cases, the model which is used is definitely a 
finite one arising from randomization considerations. The paper is concerned with a restricted class of 
experiments, namely those in which various treatments are being compared. Some discussion is given 
of the criteria by which a theory of inference may be evaluated. After dealing with these preliminary 
questions, the paper is concerned with the basic patterns of comparative experimental designs, as 
follows: (1) the completely randomized design, (2) randomized complete blocks, (3) incomplete block 
designs, (4) Latin square designs and (5) designs in which treatments are applied in sequence to the 
experimental units. 

The essential assumption is that of the existence of a device which produces random numbers, 
The mathematical treatment is presented by the use of random variables which specify the distribution 
of the treatments on the experimental units. This method of description makes clear the nature of the 
inferences that are being made, and reduces the problems to the consideration of the distribution of 
(usually) simple functions of the random variables. The concept of additivity is defined and the role of 
additivity in experimental inference discussed. 

Finally a comparison of the merits of inferences based on randomization theory with the merits of 
normal law inferences is made. Also some points are noted about randomization inferences based on 
techniques other than the usual analysis of variance of observed values. 


Precision Measurements in Thermometry. H. F. Srmuson, National Bureau of Standards. 


The accuracy of temperature measurements on the International Temperature Scale, using plati- 
num resistance thermometers, depends on the accuracy of realizing the fixed points and interpolating 
between them. It also depends upon the reproducibility, sensitivity, internal consistency, etc. of the 
measuring instruments. Theory indicates that 5 seconds observing time should be sufficient to determine 
a temperature near 0°C. with a precision of 0.00001° 99 times out of 100. Actual determinations, how- 
ever, fall short of this goal by more than a factor of 10. We are using the statistician’s tool of completely 
orthogonalized latin squares to point out significant factors which need to be improved to enable us to 
make determinations with precision approaching the limit imposed by theory. 


Statistical Problems Encountered in the Programs of the Small Defense Plants Administration. 
Ropenrick H. Rizey, Small Defense Plants Administration. 


Statistics on small business are extremely inadequate, there having as yet been no consistent 
focusing on small business qua small business. A basic handicap is lack of a suitable definition of small 
business, but progress is being made toward one, in which statistically significant variations among 
industries are recognized, to replace the present uniform test of 500 employees. Adoption by SDPA of 
such improved definition for use in Government programs, as authorized by statute, would also hasten 
the achievement of greater uniformity in program statistics, which is sorely needed. 

Problems of general statistics on small business, necessary for broad policy objectives of Congress, 
are more fundamental. Necessary cross-tabulation requires substantial increase in size of samples, with 
accompanying difficulties of insuring reliability. This entails additional costs, which must be weighed 
against the importance of additional information on small business which constitutes only a limited 
contribution to over-all knowledge of the economy. 

SDPA works with major statistical agencies to maximize usefulness of standard business series for 
small business analysis, such as through retabulation of Census establishment data by size of company 
and by product as well as by industry. 


Organizational, Personnel, and Statistical Problems Facing the Neophyte Station Statistician. J. G. 

Darrocu, State College of Washington. 

The purpose of the paper was to outline the problems encountered in a recently created position of 
experiment station statistician at the State College of Washington. Problems cited were derived from 
the three principal functions of the position, consultant to the research group, review of scientific papers 
and bulletins, research project supervision. A diverse agriculture, much of it horticultural, introduced 
the consultant to a wide range of crops, many of them perennial in habit. This perennial feature, coupled 
with a complete or partial dependence upon irrigation, demands a careful evaluation of design properties 
as related to ease of management. Salvaging of older perennial experiments presents a challenge. Many 
crops involve multiple harvests, thus labor management must receive consideration when planning the 
experiments. Entomological experiments encountered were frequently associated with either sampling 
or transformation questions. Problems of technique are rather prevalent in poultry nutrition, among 








, 1953 


pation 
upon 
tely a 
ass of 
given 
inary 
18, as 
block 
0 the 


abers, 
ution 
of the 
ion of 
ole of 


"its of 
ed on 


plati- 
lating 
f the 
‘mine 
how- 
letely 
us to 


ation. 


stent 
small 
nong 
>A of 
asten 


press, 
with 
ghed 
nited 


8 for 
pany 


1. G. 


on of 
from 
\pers 
uced 
pled 
rties 
[any 
: the 
ling 
10ng 


a ie a ease 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 635 


others, differential sex response to treatment needs to be recognized as a real source of error. These are 
only a few of the problems mentioned. The review of manuscripts has been found valuable in the 
initial stages as a means of becoming familiar with the research problems in progress, the research 
personnel involved and as a basis for educational efforts. The degree of utilization of the statistical con- 
sultant is largely dependent upon the general level of statistical knowledge of the research group; thus 
an educational program is imperative before one can expect to be presented with realiy challenging prob- 
lems. 


Organization and Scope of Activities of Station Statistician. Cant E. MarsHau, Oklahoma Agricultural 
and Mechanical College. 

This paper gives a picture of the experiment station’s statistician in the Land Grant Colleges of the 
United States based on letters of inquiry to the directors of our forty-eight Agricultural Experiment 
Stations. A summary of the forty-four replies follows: 

Station statisticians are staff members of about one-third of our Land Grant Colleges. If there is 
more than one at a station, they are usually associated with some administrative unit such as a statistical 
laboratory or a department of statistics; otherwise they are attached to the director’s staff or are mem- 
bers of some subject matter department such as agricultural economics, etc. If no statistician is desig- 
nated as station statistician, the consultant usually is a member of some department. In that case, he 
often cuts across departmental lines in rendering service to the station. 

The scope of activities of the statistician is very broad. Through his efforts to keep abreast of the 
many fields of research, he may act as a coordinator of research among the many departments. He is 
connected with the teaching program of the college, usually in the field of statistics. His services are 
extended to the in-service statistical training of the research staff. Computing services are available at 
most stations under the supervision of the statistician. There is considerable variation in the training 
deemed essential in fields outside of statistics. 


Research on Extent and Scope of Collective Bargaining. Kinx R. Persuex, Washington, D.C. 

The paper deals first with the inadequate knowledge of extent of coverage of collective agreements, 
the difficulty of picking a representative sample from what is essentially an unknown universe, and the 
shortcomings of such quantitative analyses of collective agreement provisions which are being made on 
this basis 

Then methods of studying bargaining patterns are proposed which would throw light on the uni- 
formity or variation between agreements and the way different clauses develop and are passed from one 
agreement to another. Bargaining is explained as a year-round activity, of which informal accommoda- 
tions of daily problems are an important part, as are grievance and arbitration cases. In fact, they pro- 
vide the formal agreement with real content. Research into informal arrangements, into unofficial re- 
ports of mediators (state or federal) and into the substance of arbitration decisions illuminate uniquely 
process and contents of collective bargaining. Most of this research needs to be done. Finally, the sub- 
stance of bargaining scope, i.e. the limits of management prerogative, has been subjected to too much 
detailed research, rather than recognition as a matter of bargaining, hence pragmatically determined in 
each case. 


Comparison of the Means of Two Samples. Davip L. Wauuace, Princeton University. 


Two samples of measurements of the results of a process are given. In each of the samples a different 
treatment was used. Procedures for comparing the effects of the two treatments are discussed. Interest is 
restricted to summarization of the results of the experiment by a confidence interval. For paired samples, 
procedures based on Student’s t-statistic, the sign test count, Wilcoxon’s signed-rank sum, Lord’s short- 
cut version of ¢ using range, and Walsh’s range-midrange test are considered. For unpaired samples, the 
two-sample Student’s ¢-procedure, its modification to allow for unequal variability within the two 
samples and Wilcoxon’s two-sample count procedure are considered. Practical methods for constructing 
confidence intervals are shown for each procedure. The different procedures are compared according to 
the criteria of distribution restrictions necessary for validity, power, and ease of application. 


Production of Vital Statistics as a Combined Federal-State Operation. O. K. Sacen, Illinois Depart- 
ment of Public Health. ' 

Up to the present the national vital statistics of the United States have been compiled from individual 
transcripts of birth and death certificates furnished to the Federal Government by the states. This results 
in a considerable degree of duplication, since the same data are coded, key-punched, and tabulated at 
both levels of government. To eliminate part of this a procedure has been developed whereby the states 
may furnish duplicates of their punched cards to the National Office of Vital Statistics for national 
tabulations, The procedure was experimentaily tested with the State of Illinois on births in 1950 and in 











636 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


1951. Duplicates of the Illinois punched cards will be used in the production of national birth statisticg 
for 1951 and after. Two additional states will submit punched cards for 1953 and others are planning 
this for 1954. One state is experimenting on the feasibility of furnishing pretabulated data in the form 
required for national statistics. 

This combined operation requires adherence to uniform definitions and interpretations, as well as 
consistency and accuracy in processing by the participating states and the National Office of Vita] 
Statistics. The basis for such cooperation has been laid in the development of close working relationships 
over a period of years. Death data present particular difficulties in statistical processing because of the 
complications in cause-of-death coding. In time it is expected that national death statistics can also be 
produced by a combined operation such as has been started for births. 


Research on Response Errors. Ex1 S. Marks, Bureau of the Census. 


Census Bureau research takes as basic the distinction between “response variance” and “response 
bias.” While measurement of bias is much more difficult than measurement of variance, results obtained 
point to need for bias studies. Large response variance may be associated with either large or small net 
error (bias) in the Census. In the Post-Enumeration Survey of the 1950 Censuses (a reinterview study 
designed to check the accuracy of the Censuses), over one-third of the persons reported in certain cate- 
gories in the Census (e.g., 1949 individual incomes of $2500-2999, occupied dilapidated dwelling units) 
were reported as not in the category on reinterview. However, net differences between Census and re- 
interview results are, in general, small—less than 10 per cent in most cases. It is quite possible for a 
large proportion of the individual reports to show substantial reporting errors without any significant 
effect upon the distribution of the entire population or upon the conclusions that might be drawn from 
the data. On the other hand, a consistent error in reporting, even though it affects only a small propor- 
tion of the individual reports, may result in substantial distortion of the distribution of the entire popula- 
tion and of conclusions based upon this distribution. 


The Distribution of Government Burdens and Benefits. Rurvus Tucker, General Motors Corporation. 


No sound judgment can be formed converning either the equity or the economic burden of a tax 
system unless the distribution of the benefits of government activity financed by the tax system is also 
studied. This paper is limited to distribution by income classes. 

The redistribution of income was accelerated between 1929 and 1948 and for this the progressive 
nature of our tax system was partly responsible. But the increasing tendency to spend government 
money for the benefit of the lower and middle income classes was more responsible. 

The burden of all taxes, direct and indirect, rose from under 12% of income in 1929 to over 27% in 
1948. The burden on the poorest tenth rose from under 9% to nearly 17%; the burden on the wealthiest 
one-hundredth rose from 18% to 51%. , 

Although the average income per spending unit, measured in constant dollars and after deducting 
income taxes, rose 36% from 1929 to 1948, the average income of the top one-fifth of spending units, 
measured in the same way, fell 10%. 

Some government expenditures are plainly for the direct and sometimes exclusive benefit of certain 
classes and can be allocated to income classes with a fair degree of confidence. Other expenditures are 
for the general welfare, and can be allocated with equal logic on the basis of consumption or income or 
property, or per capita. We find that the poorest half of the population has been receiving more benefits 
from government that it has paid for, while the wealthiest one-tenth has been paying for much more 
than it received. 

In 1948 the ten per cent of consumer units that received the lowest incomes received 2.3% of all 
income, paid 1.4% of all taxes, and received between 3.9% and 7.0% of all government benefits. The 
wealthiest ten per cent received 31.3% of all income, paid 45.3% of all taxes, and received between 
14.2% and 31.4% of all government benefits. The figures for preceding years show similar relationships, 
with greater inequality of income and less progression in taxes. 

Our system of taxation is progressive against income, and even more progressive against benefits 
received. It has already reduced the disposable income of the wealthiest twenty per cent of our people 
(those belonging to consumer units with incomes over $5000 in 1948). It is time to consider seriously 
whether higher taxes on the wealthy, or higher taxes in general, may not reduce the total national in- 
come, or at least prevent its growth, with other undesirable economic, political, and moral conse- 
quences. 


General Equilibrium Aspects of Incidence Theory. Ricnarp A. Musarave, University of Michigan. 

In formulating incidence theory, it is necessary to define what is meant by incidence and effects of 
taxation. I propose that the former be defined as the change in the distribution of real income by size 
income brackets, the latter as the change in the level of real output which results from the economy's 








ER 1953 


1 Btatisticg 
planning 
| the form 


as well as 
> of Vital 
stionships 
use of the 
:n. also be 


wn from 
1 propor- 
» popula- 


poration, 
of a tax 
m is also 


gressive 
ernment 


» 27% in 
palthiest 


ducting 
ig units, 


certain 
ures are 
come or 
benefits 
*h more 


% of all 
its. The 
etween 
mships, 


benefits 
people 
riously 
nal in- 

conse- 


chigan. 
‘ects of 
by size 
10my’s 





la le ar 


ns a ih Tt 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 637 


adjustment to a change in budget policy. A further important distinction is between absolute incidence 
where a tax situation is compared with a no-tax situation, and differential incidence where the respective 
results of two taxes providing equal yield are compared. For various reasons the latter approach is 
referred. 

. The problem of general equilibrium in incidence analysis may be demonstrated with regard to the 
incidence of excise taxes. The conclusions are: (1) It is only of minor importance whether the initial 
adjustment to the tax takes the form of increasing the price of the taxed commodity while holding factor 
payments constant, or of decreasing factor payments while holding the price of the taxed commodity 
constant. What matters is the resulting change in the relative market prices of consumer and of capital 
goods. (2) The result of an excise on consumer goods only will be to raise the price of consumer goods 
relative to the price of capital goods, whatever be the change in the absolute level of prices. Such a tax, 
therefore, will fall on the consumers. Since consumption expenditures decline as a percentage of typical 
family budgets when moving up the income scale, the incidence is regressive. (3) The result of an excise 
on capital goods only will tend to leave relative prices of capital and consumer goods unchanged. Such a 
tax, therefore, will fall on both consumers and savers (buyers of capital goods) and tend to be distributed 
proportionately. However, a general retail sales tax is largely a tax on consumer goods and it is justified, 
therefore, to impute such a tax to the purchasers of the consumer goods. 

The above argument involves certain simplifying assumptions. In particular, we have disregarded 
possible resulting changes in the distribution of money earnings, and possible resulting changes in total 
output. However, there are reasons to expect that such changes will be distributionally neutral and that 
they may be disregarded, at least in a first approximation to the problem. 


Exposition of Straight Line Fitting Methods. Ricnarp F. Link, Princeton University. 

Suppose 7 = A +B & describes the relationship between two variables (£, 7). The problem of estimat- 
ing and setting confidence limits on A and B given paired estimates (z, y) of (E, 7) is discussed. 

The classical case of z measured without error and y = +w, E(w) =0, var (w) =o? is discussed. ‘The 
classical least squares and the Nair and Shrivastava procedures (Sankhy&, 6, 1942) for obtaining 
estimates of A and B are illustrated. The relative precision of the estimate of B for the two procedures 
is indicated. Confidence limits are found for A and B by classical methods assuming y to be distributed 
according to N(7, 02). Confidence limits for A and B are also found using short-cut techniques involving 
the use of the range. 

The case of x measured with errors is discussed. The additional assumptions and information neces- 
sary for handling this type of data are indicated. In particular the use of an instrumental variate is dis- 
cussed. 
The methods for estimating A and B under these assumptions proposed by J. W. Tukey (Biometrics, 
7, #1, 1951) are illustrated. The procedure for obtaining confidence limits for B with these methods is 
also illustrated. The Nair and Shrivastava procedure is also illustrated for these assumptions. 


Broadening the Significance of Vital Statistics Through Special Studies. Paut M. Densen, University 
of Pittsburgh. 

In the past, routine vital statistics of births, deaths and infectious diseases sufficed to answer many 
of our statistical needs. These vital statistics often pointed up the need for more detailed investigations of 
the problems and much of value was learned through related studies. Full significance of vital statistics 
has always required the evaluation of special studies. 

Routine vital statistics continue to point out where our problems lie, but they no longer provide 
sensitive indices to the magnitude of the problem and to the effectiveness of efforts to improve the health 
of the population. Neither our mortality statistics nor our notifiable disease statistics give any indication 
of the magnitude and distribution of diseases of long duration or those with low mortality rates and high 
disability rates. Such information as we do have comes from special studies of one kind or another, 
particularly morbidity surveys. 

Vital statistics have made it clear that the character of our public health problems is changing. 
With these changes have come concomitant changes in public health practices. There is urgent need to 
evaluate the effectiveness of these changes in practice in improving the health of the population. 

There are several reasons why such evaluation is essential to continued progress in public health: 

(1) The influence of public health programs on the health of the population is far more subtle than 
it used to be and more sensitive measuring instruments are needed to measure this influence. It is these 
relatively more subtle changes that must be measured if we are to continue to interest. the public in sup- 
porting public health programs. We can only do this when we are in a position to produce direct evidence 
of the effect on health of specific procedures. Such evidence can only come from carefully controlled 
special studies. 

(2) The unit of operation of public health today is becoming increasingly the individual rather than 











638 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


the population as a whole. It can no longer be ssid that if a local health department is available there 
will necessarily be less diphtheria, less typhoid, fewer maternal deaths, because these are very minor 
problems today. On what basis then shall expenditures for health work be justified? The objectives of 
present public health programs must be examined and studies designed to evaluate achievements in rela- 
tion to these objectives if such justifications are to be found. 

The several functions of public health statistics are best served by a combination of routine vital 
statistics and special studies. If the necessity for such a combination is to be recognized and provided 
for administratively, administrators and program directors must come to realize the basic quantitative 
nature of the problems facing them in the development of public health policy. A demonstration is 
needed of the way in which the statistical approach can help the administrator in the formulation oi 
policy. 


Improving Marriage and Divorce Statistics. Huea Carter, Public Health Service, Federal Security 
Agency 
This paper is published in full elsewhere in this issue. 


Pension Plans—The Concept of Actuarial Soundness. Dorrance C. Bronson. 


As a minimum for actuarial soundness, the employer should currently fund the pension credits 
applicable to the years elapsing after the plan’s inception and should, by retirement age, have funded 
the past service credits for the then retiring employees. These definitions will not satisfy all parties; at 
best the concept of actuarial soundness falls in a penumbra of meanings and techniques. 

The paper describes three different purposes of the actuarial reserve of a pension fund (in the sense 
of assets in the fund) the main one being the security for benefits under a presumption that the plan will 
terminate, because if a plan is to continue in perpetuity, the reserve only serves to earn interest to help 
pay the pensioners. It is only when the plan terminates that the security for pensioners and employees 
represented by the reserve, demonstrates itself. A second purpose of a reserve system is to provide a 
framework for the scientific orderly funding of the plan. A third purpose of the reserve, of rather recent 
origin, is to make money for the employer or, in some instances, to earn increased pensions for employees. 
This purpose is illustrated by an investment policy for the fund aimed at substantially higher yields, or 
capital gains, than are required by the rate of interest assumed by the actuary. 

Thus, any investment return earned above the actuarial rate would revert annually to the credit of 
the employer. The paper points out that the true pension costs really lie only in the actual disbursements 
under the plan, which cannot be changed by actuarial assumptions. The actuary can only estimate and 
level out his estimates for orderly funding. These estimates involve numerous assumptions; rates of 
mortality, disability, retirement (by age), separation from service, and entrance into service; also the 
rate of interest mentioned above and in some cases, the rate at which salary improves with age. An 
expense rate is sometimes assumed if the fund is to meet the expenses. 

The paper touches upon actuarial methods for determining contributions and the incidence thereof 
and describes several connotations of “over funding” and “under funding.” 

In the field of public employee plans, the taxing power is often substituted for actuarial soundness. 
A retired public employee cannot feel too secure where little or no fund stands back of his pension. The 
federal Civil Service Retirement Act, while having sufficient funds, perhapa, for the existing pension 
roll and for accumulated employee contributions, does not have any assets—but only the taxing power 
—against the Government obligations for accrued pensions on active employees. It is not actuarially 
sound according to the tenets of this paper. The same thing holds true for the Railroad Retirement Act 
—a quasi Governmental plan. 

The paper points out that Social Security probably transcends the usual test of actuarial soundness. 
There is little rationale for a large reserve accumulation invested in Government bonds for a system of 
this type. 

A revision of the Social Security benefit system to include all present aged, as is being done in 
Canada and as is being currently subjected to a referendum here through the United States Chamber of 
Commerce, would resolve much of the argument for and against the accumulation of large reserves and 
would sensitize the contribution (tax rate) to an immediate appreciation of the cost of liberalization in 
benefits. 

The dangers of inflation are discussed in making meaningless an achievement of actuarial soundness 
and in making difficult, if not impossible, continued rounds of higher benefits under the plan for the 
existing pension roll at any time. Perhaps only Social Security can raise benefits “across the board” 
because of inflation. The paper questions whether the country can really stand actuarial soundness in 
the large accumulation of investments which this would mean. If all employers with 50 or more em- 
ployees were to set up actuarially sound plans, it might entail reserve assets approaching the present 








‘R 1953 


le there 
y minor 
tives of 
‘in rela- 


ne vital 
rovided 
titative 
ation is 
ition of 


Jecurity 


credits 
funded 
jies; at 


> sense 
an will 
0 help 
loyees 
vide a 
recent 
oyees. 
lds, or 


sdit of 
ments 
fe and 
tes of 
0 the 
e. An 


1ereof 


iness. 
. The 


ower 
rially 
t Act 


ness. 
2m of 


ne in 
er of 
sand 
on in 


r the 


cle 








SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 639 


national debt. Would our capital plant and its productivity keep pace with the savings necessary to 
represent this degree of actuarial soundness? 

Finally the paper discusses the consequences of lack of actuarial soundness—citing the Railroad 
situation prior to the Railroad Retirement Act—calling attention to the coal fund and raising the ques- 
tion as to what extent, if funds are not sound, an ultimate bailing out by nationalization of the pension 
plans might ensue. Fortunately most industrial pension plans are proceeding on a basis which would not 
bring them to these straits, but this, in the main, has been during a period of ready money and high 
tax rates so that whether these indicated good intentions of making contributions on an actuarial basis 
continue, in a different economic milieu, is one of the questions for the future. 


Labor’s View on Actuarial Requirements for Pension Plans. SoLomon Barkin, Textile Workers Union 
of America, CIO. 

Trade-unions, early in the days of the current pension movement, directed their attention to 
establishing benefits for the superannuated and those about to be retired. The first plans centered around 
providing retirement benefits of $100 for employees with 25 years of service with a company, with re- 
duced amounts for employees with lesser service. The unions relied upon the financial solvency of the 
particular company in the years ahead to meet the obligation. 

As collective bargaining developed, there was an increasing emphasis upon fixed cost plans with 
defined benefits. Both management and unions favored these programs since they provided a more 
determinable basis for negotiations. Four types of fixed cost plans were evolved: (1) a fixed hourly rate; 
(2) a defined obligation to meet the cost of current service and amortization of past service over a defined 
period; (3) fixed percentages of payroll; and (4) fixed charges on units of output. Unions have also 
identified the contribution as a form of wage payment. With the fixing of the contribution, they insisted 
upon the separation of the sums into trusteed funds. Unions have disapproved of profit-sharing systems 
as methods of financing these funds, as they do not provide fixed rates of contribution. 

The segregation of the pension funds and the fixing of the employer’s rate of contribution have 
opened up opportunities for determining the best utilization of these funds. Unions are promoting the 
establishment of worker rights to the benefits, even if he does not remain with the company the full 
service period required for the full benefits. In newer industries, the early vesting privileges are being 
combined with provisions for separation pay to enable employees to arrange their transfer to new jobs 
with greater ease, 

Other developments may be noted: 

1. The employer's rate of contribution has been increased as the pension funds have increased their 

benefits. 

2. The increase in federal Social Security benefits has led to improvements in the benefits received 
since many plans provide for the worker to share all or part of these improvements without re- 
duction of their benefits under the private pension. 

. Unions have insisted that the actuarial gains resulting from the use of low interest and turnover 
rates and assumption of early retirement be kept in the fund. 

. Unions are increasingly favoring self-administered and self-insured plans. 

. Cost-of-living adjustments have been discussed in some negotiations. 

. More adequate provision for the disabled is receiving attention. 

Benefits have been made more liberal for employees of longer service and the higher wage 

brackets. 

Funds built on conservative principles have shown impressive actuarial gains. The increased com- 
prehension of this entire mechanism by trade-union leaders has also provided a base for adapting the 
benefits to the peculiarities of different industries and groups of workers, as well as improving them so 
that actuarial re-evaluations are necessary. 

Unions are aware of the need of keeping “sound” pensions funds. But they are armed with the ex- 
perience that the rate of contribution is not necessarily fixed. In an expanding economy actuarial gains 
and higher contributions provide the base for improved benefits. Pensions should be adequate to enable 
those unable to work, to retire voluntarily with reasonable financial security. These programs must not 
be the vehicles for forcing the retirement of older workers. Self-administration by joint committees of 
management and unions affords the, greatest opportunity for promoting realistic collective bargaining 
and effecting changes which will better realize the purposes of the plan. 


eo 


Noo, 


Some Recent Developments in Canadian Statistics. HersertT MARSHALL, Ottawa, Canada. 

The paper describes some of the new methods used in taking the 1951 Population and Housing 
Census of Canada. These included a much greater mechanization of the Census operations. A “mark- 
sense” document was used in the field and these, when completed, were run through a new document 
punch machine to produce a punched Hollerith card This and other new methods are resulting in a 











640 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


reduction in the time normally required for completing a census by one-half and in large monetary 
savings. 

Developments in the field of Industrial Statistics include a revised index of industrial production in 
which the concept of “net output” is used as a current indicator for numerous industries instead of gross 
value of production. Other developments consist of sample surveys furnishing information adequate to 
permit projecting annual statistics on a monthly basis for sales and inventories, thus furnishing current 
indicators. 

In Health statistics the main developments have been the undertaking of a sample survey of sick- 
ness in the general population and the organization of a much more adequate system of hospital sta- 
tistics. 

In Agriculture experiments are being made for obtaining current statistics in certain sectors of the 
field with probability sampling based on new information secured in the 1951 Census of Agriculture. 

The new Consumer Price Index put out by Canada in October 1952 was the resu!t of three years of 
preparatory work. The methods used include numerous departures from those used in the Cost-of-Living 
Index which it replaces. Two important developments were the introduction of a method to adjust 
for seasonal changes in the consumption pattern for certain foods and the inclusion of a measurement 
of home ownership costs. 


Some Recent Applications of Statistics in Australia. Maurice H. Bexz, University of Melbourne. 


At the National University in Canberra, a Department of Mathematical Statistics has been 
created, with research and advisory functions, while in the Universities of Melbourne and Adelaide 
progress has been made in extending the statistical teaching to undergraduate students in various 
faculties as well as to graduate students and research workers. Advisory work is undertaken by the 
Department of Statistics in the University of Melbourne on behalf of the various faculties in which 
experimental work is conducted, and the same kind of service is performed in other Australian Uni- 
versities by the several statisticians attached to the various Departments of Mathematics. Some outsice 
consulting work in statistics is also undertaken on a contractual basis. 

In the research field, engineering and agricultural experiments have introduced modern statistical 
techniques, such as factorial, split-plot, partial replication and multiple regression procedures. These 
have been employed in the gasification of coal, production of hard carbon from brown coal, briquetting 
of brown coal, traction research on various agricultural machinery, analysis of rainfall data, prediction 
of rainfall, secular variation of rainfall, ecological investigations, biological assays, etc. A considerable 
interest is also being displayed in medical statistics, both in Sydney and Melbourne. The Common- 
wealth Scientific and Industrial Research Organization continues to expand its statistical activities in 
almost all of its Divisions and Sections from Plant Industry to Tribophysics and from Forestry to Ani- 
mal Husbandry. 

Research in Mathematical Statistics is being pursued in the various centers, often inspired by prac- 
tical problems presented by the experimenter, for example, tasting experiments, missing values in certain 
complicated designs, separation of chemical solutes, and distribution of chain molecules. 


Recent Application of Quelity Control in Japan. N. Yamax1, Mitsubishi Denki K. K. 


In 1946 quality control as a means of quick rehabilitation in the post-war world was investigated. 
In Mitsubishi Denki K.K., our preliminary study had three phases: (1) Preparation of booklets to be 
circulated throughout the organization. (2) Training of a specialist in statistical quality control. (3) Prac- 
tical tryout of the methods of statistical quality control. 

Two events were influential in Japan: (1) Japanese Electric Communication Industries sponsored a 
management course, under the guidance of Civil Communication Section of the Occupation Forces. 
The objective of the course was to train top management people in scientific management, and statistical 
quality control was one of its main sections. (2) Japanese Union of Scientists and Engineers started a 
training course for engineers on statistical quality control. 

In 1950, two lines of very important work in this field were developed for Japanese industry as a 
whole: (1) Dr. W. Edwards Deming visited Japan and was extremely influential in the quality control 
work in Japan. (2) The Japanese Standards Association organized a statistical quality control com- 
mittee, in order to develop some standards in quality control methods. 

In the fall of 1951, we had within Mitsubishi organization twenty full time workers and one hundred 
and eleven part time workers for statistical quality control. Our saving achieved by that time through 
statistical quality control was estimated at roughly 0.5% of our current sales volume of $28,000,000 in 
1951. Many firms in Japan have been following quality control programs more or less similar to ours. 
There have been similar difficulties, problems, success and failure. Probably one of the fundamental 
problems, which has not been fully solved yet, is lack of tight but flexible tie between statistics and 


1953 
' etary 


jion in 
gross 
ate to 
irrent 


t sick- 
‘I sta- 


of the 


ars of 
siving 
djust 
ment 


slaide 
rious 
y the 
which 


itside 


stical 
These 
tting 
ction 
rable 
mon- 
es in 
Ani- 


>rac- 
rtain 


ited. 
o be 
rac- 


ed a 
rces. 
tical 
ed a 
asa 


trol 
om- 


ugh 
0 in 


ntal 
and 








SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 641 


Statistical Organization and Estimates of Crops in West Bengal, India. N. Coakravart, State Statistical 

Bureau, Calcutta. 

Statistical work of the West Bengal Government is centralized in the State Statistical Bureau, which 
serves as coordinator of data processing, standardizer of reporting forms and publisher of all routine 
statistics. All statistical surveys are designed and executed by the Bureau. Examples of topics covered 
are: annual industrial census, acreage and yield estimates for crops, price indices, cost of living indices, 
irrigation and hydroelectric benefit assessment rates, and family budgets. Special ad hoc surveys are 
also made; examples are: living characteristics of middle and lower class families in order to set minimum 
wages for government employees; refugee population counts; relationship between rental values and tax 
assessment values; morbidity, birth and death data. 

Crop sample surveys are used to implement the government food control and rationing program 
Since movement of crops from one district to another is prohibited, accurate data on district production 
is required. Sample surveys involve sampling units of about 2 acres selected systematically at random at 
intervals of half a square mile of cultivated areas. The sample is divided into two subsamples on the 
basis of odd and even numbered sampling units. Investigator variance and bias is estimated by replica- 
tion for fifteen per cent of the sample units. Special precautions are taken to revise estimates in the light 
of subsequent crop disasters, caused by flood, pestilence or weather. 


Testing the Homogeneity of Treatment Means in an Analysis of Variance of Engineering Data. D. B 

Duncan, Virginia Polytechnic Institute. 

This is a discussion of several methods recently proposed for testing the significance of differences 
between treatment means in an analysis of variance. The methods include, (A) a studentized range 
testing procedure, Newman (1939) and Keuls (1952); (B) a multiple F testing procedure termed the 
Multiple Comparisons Test, Duncan (1947, 1951); and two test procedures (C) and (D) given by two 
confidence interval methods Tukey (1952) and Scheffe (1952) respectively. 

The basic differences between these procedures are classified and illustrated graphically in a simple 
5% level case involving only three means. The most important difference is the use by A and B of a 
successive-tests principle not used by C and D. This makes A and B considerably more powerful than 
C and D, without any inappropriate increases in error rates. The second difference concerns the relative 
significance level of individual tests in the successive test procedures A and B. B is more powerful than 
A through using a special system of levels, the validity of which is briefly discussed. Other less im- 
portant points of difference are also illustrated. A separate section is included entitled “The Multiple 
Comparisons Test Extended to the Problem of Separating Treatment Means With Unequal Replica- 
tions.” 


Some Applications of Statistics to Research in Time and Motion Study. H. C. Swzenr, Virginia Poly- 
technic Institute. 

Time and Motion Study is a procedure for determining the time required by an “average” operator, 
working at a normal tempo, to accomplish some task. At the present time, this procedure is more of an 
art than a science and, as such, contains many aspects which on close examination seem questionable. 
The use of statistical techniques in the past in both research and application of Time and Motion Study 
has been limited, and in the few cases wherein advanced techniques have been used, the appropriate- 
ness of the model appears questionable. This paper reviews the special problems inherent in research and 
application of Time and Motion Study procedures, and discusses the use of statistical techniques to 
these problems. An example is given using data from an experiment to emphasize the problems inherent 
in research of this nature. 


The Elements of an Industrial Classification Policy. Watt R. Snauons, U. S. Bureau of Labor Statistics. 
This paper is published in full elsewhere in this issue. 


The Use of Statistical Techniques in the Accounting Department of a Large Manufacturing Company. 

D. A. Lrvinaston, Monsanto Chemical Company. 

There has been a growing demand for statistical techniques in the area of accounting communica- 
tion. Charts, tables, and statistical reports not only streamline presentation but they minimize the time 
required by management to locate significant relationships and trouble spots. 

The work which the Statistician performs in the accounting department of Monsanto Chemical 
Company is: 1. Preparation of financial and operating chart series. 2. Periedical statistical reports and 
special studies. 3. Construction and presentation of indexes of selling and raw material prices. 


Financial and Operating Charts 
Our chart program is composed of three basic series which are directed to the top levels of manage- 
ment: The first of these series, the Director's charts, consists of six charts which are graphic income 











642 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


statement summaries prepared quarterly to cover the operations of the Company. The Executive Com- 
mittee chart book series consists of about 110 annual and monthly charts and the accompanying 
tabular data. These charts highlight the operations of the Company and compare Monsanto’s results 
with other leading companies in the chemical industry. For each of the seven Divisional Managers, 
there is a divisional chart book, which has between 30 to 37 monthly and annua! charts and tables which 
review the operations of a division. 

The first or pivotal chart in the series shows the return on average investment ratio since it is an 
accepted Company policy to consider return on investment employed in the business as a prime measure 
of management effectiveness. Charts depicting the ratios of the various factors from net sales to net 
income considered with other ratios related to average investment shed light on the causes underlying 
variations in return. 


Regular Periodical Reports and Special Studies 

Among the regular periodical statistical reports are a percentage of actual to rated capacity opera- 
tions report prepared monthly, a comparison of inventories by division and by inventory classification 
which is also issued monthly, and quarterly and annual comparisons of Monsanto’s operations with 
those of other leading chemical companies. One of the more interesting special studies in process at this 
time is one which traces in detaii the income and financial growth since 1936 of the top seven chemical 
companies. 

The responsibility for the initiation of any report and for its form and content resides in the Comp- 
troller, One of the main duties of the Statistician is to propose new areas needing investigation. 


Indexes of Raw Material and Product Prices 

Monsanto has had selling prices, raw material, and wage indexes which extend from and were 
based in the year 1939. These indexes have been published regularly in the Company annual reports. 
It was decided this year to revise the present indexes and convert them to a 1947-1949 average price 
base. 

The financial and operating charts, reports, and the price indexes have been tailored to one major 
objective: the needs and understanding of management. They are steps in a program designed to utilize 
statistical techniques for better communication of information to management. 


Discussion of Congressional House Committee Report of the Investigation of the Federal Crop Report- 
ing Service. Jonn J. Hemmuncer, Counsel, Committee on Agriculture, House of Representatives— 
“Summary of House Committee Report.” J. Roger Watuace, New York Journal of Commerce— 
“Forecasting and Estimating the Cotton Crop.” Jonn D. Baxer, Longstreet- Abbott Company, 
St. Louis—*“Evaluation of Forecast of the Wheat Crop.” Lauren Sorn, Des Moines Register and 
Tribune—*Needed Improvements in Estimating the Corn Crop.” 


(A consolidated abstract.) 


Papers presented at this session will appear in Agricultural Economics Research, published by the 
Bureau of Agricultural Economics, 

The Special Subcommittee of the House Committee on Agriculture (82nd Congress, 2nd Session) 
made a non-technical investigation of the methods used in making the official crop reports and issued a 
report—“Crop Estimating and Reporting Service of the Department of Agriculture.” The papers pre- 
sented at this session of the American Statistical Association critically reviewed the recommendations 
made in this Report, evaluated the reliability of the official crop reports for cotton and winter and spring 
wheat, pointed out the need for expanding the scope of crop reports on corn, and recommended modifica- 
tions in policy and methodology designed to increase the accuracy of estimates at the national level 
of major agricultural products. 

The unusually large plus departures of the monthly reports of 1951 cotton productions, from final 
ginnings figures, prompted this Congressional investigation. 

The methods of mail sampling to a non-probability list sample of reporters and graphic regression 
estimating procedures are the same in principle for cotton as for all other major crops, except that be- 
ginning with the October crop report cotton ginnings are used to supplement the usual methods. Cotton 
is the only major crop, except tobacco, and to a lesser extent wheat, for which independent and accurate 
check data on production are available for evaluating the accuracy of o~ reports and for use as the 
dependent variable in making regression estimates. 

Cotton: When the ten-year average deviations of monthly national estimates of cotton production, 
from final ginnings figures for 1941-50, are compared with the period 1915-24, a significant downward 
trend is indicated: Deviations for the August reports are down nearly 40 per cent; for September less 
than 30 per cent; and for October and December about 50 per cent. The smaller decrease in the devia- 





ying 


pra- 
ion 
vith 
this 
ical 


SS a: oe 





SUMMARIES OF PAPERS AT THE 112TH ANNUAL MEETING 643 


tions of the September, as compared with the August, reports is surprising, as the cotton crop is usually 
“pretty well made” by September 1. 

More than one-third of the deviations of the 1951 national estimates of cotton production from final 
ginnings was caused by seriously overestimating the cotton acreage in cultivation on July 1. 

Winter Wheat: The official reports on winter wheat production for comparable periods of plant 
development are less reliable than for cotton—both as to magnitude of the deviations between monthly 
reports and final estimates and as to constant bias (underestimation). 

There has been some slight, though not significant, improvement since 1922 in the accuracy of the 
winter wheat crop reports when they are evaluated against the final revised estimates of production 
(1922-30 versus 1941-50). 

Spring Wheat: The percentage deviations between the reports of spring wheat production and the 
final revised estimates are significantly larger than is the case with winter wheat. An analysis of the 
record (1922-30 versus 1941-50) shows no decrease in magnitude of deviations of monthly crop reports 
except for the August report or in the tendency to underestimate production. This downward bias, how- 
ever, is not as great as with winter wheat. 

Corn: No satisfactory evaluation can be made of the reliability of crop reports on corn production, 
as no independent check data on production are available for comparison. In view of the wide variation 
from year to year in the moisture content of corn and, consequently, in feeding value, there has long 
been a demand for estimating corn production in terms of a constant moisture percentage. There is also a 
demand for bi-monthly crop reports for corn. 

It is r2cognized that there are obstacles to forecasting accurately the out-turn of a crop. Weather, 
plant disessee, changing crop varieties, reports obtained from farmers and others, which often are 
biased by attitudes, all complicate the problem. The fact that crop forecasts have long been made by 
the government and by private forecasters in face of these difficulties indicates the need for them. 

Great reliance is placed upon the government crop reports by the trade, by processors, by farmers 
and by government action agencies. The greater the accuracy of official crop reports the smaller the 
element of risk that must be borne by the buyers and processors of agricultural products, and the nar- 
rower the price margin between the farmers and consumers. 

The methods used by the Crop and Livestock Reporting Service have not kept pace with develop- 
ments in scientific sampling of the last 15 years or in crop-weather relationship research extending over a 
longer period. 

Crop reports, issued on the 8th to 10th of the month, are based upon crop conditions reported 
largely by farmers over a several day period, ending on about the 2nd of the month. Traditional operat- 
ing policies (made mandatory by law in the case of cotton) prevent use either of readily available weather 
information for the first eight to ten days of the month or the official five-day weather forecasts. 

From the standpoint of the effect of production upon prices of the major crops, the crop reports of 
national production are of paramount importance, with regional estimates next in effect. Traditionally 
however, the primary objective of the government service has been to provide accurate state estimates. 
From the standpoint of the reliability of a sample, the variability of the phenomena being sampled, both 
in space and from year-to-year, is nearly as great within any one state as it is for a geographic region or 
the entire country. Consequently, nearly as large a sample is required for any one state as for the entire 
country for a specified level of sampling precision. 

A full appreciation of these basic principles should lead to the adoption of the more realistic and 
useful primary objective of national estimates of maximum reliability for the major agricultural prod- 
ucts. Unless this is done, any moderate increase in appropriations of a few hundred thousand dollars is 
unlikely to result in any significant increase in accuracy of crop and livestock reports. 

Assuming that these “institutional” factors could be corrected and that the primary objective of 
the Crop Reporting Service could be made more realistic, there are two general lines of approach, from 
a methodological standpoint, that give definite promise of increasing the accuracy of official crop and 
livestock reports. These two approaches could be adequately implemented with an increase in appropria- 
tions of not more than 20 to 25 per cent. 

Since the beginning of the Crop Reporting Service, mail sampling has been practically the only 
method of sampling used in crop and livestock reporting. 

Area probability sampling and objective sampling of plant characteristics could be used to tre- 
mendous advantage in strengthening the mail sampling program by placing a sound foundation under 
the entire crop and livestock reporting program. 

The Congressional Committee Report recommended that the Department of Agriculture cooperate 
with the Bureau of the Census in taking an annual sample farm census in the late fall. This project 
should be implemented. A June survey, however, would provide a powerful means for improving the 
accuracy at the time of the year when it is most needed—namely, for acreage estimates which are used 
in connection with all the crop reports from July until the final December crop report. 











644 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


Probability pre-harvest field sampling of crops for the purpose of determining yield per acre and 
quality or moisture content are essential when reports from farmers are subject to considerable under- 
statement bias that is not constant from year to year. 

Certain private crop forecasters have found that cotton ginners are more reliable crop reporters for 
cotton than farmers; operators of local mills and grain elevators are better than farmers for reporting 
on wheat. 

_ If all information concerning crop conditions and weather available at the time crop forecasts are 
made, including the official five-day weather forecasts, were utilized statistically, the accuracy of fore- 
casts of crop production would undoubtedly be increased, especially during periods of critical weather 
conditions and plant growth and development. The date to which the forecast relates would be ad- 
vanced by 10 to 12 days over the present system. 

The considerable amount of research as to the relationship of weather to crop yields, conducted 
during the late 1930’s, demonstrated that weather factors, as well as soil moisture, could be used 
statistically, along with the reported condition of a crop, to increase the accuracy of forecasts of crop 
yields per acre during the growing season. None of the results of this research are being used by the 
Service at this time. 


The Mathematical Biophysics of the Cardiovascular System. Georncr Karreman, University of Chicago. 

The propagation and reflection of pressure waves in a fluid enclosed within an elastic tube are 
studied on the basis of simple physical principles. In this treatment the concept of impedance is used. 
For the case of a tube with multiple characteristics the relation between the reflection coefficients in two 
consecutive sections is determined. The result is used to determine the relation between the impedances 
at the beginning of two consecutive sections. The obtained relation is applied to the case of a con- 
striction in the tube as occurs clinically in a coarctation. As a result an expression is obtained for the 
ratio of the pressures at both sides of the constriction in terms of the length and degree of constriction, 
the distances from the constriction and the velocities of propagation in the normal and constricted parts. 
From the ratio of the amplitudes of these pressures and their phase difference it is shown that in principle 
the site of constriction can be located. The clinical importance of this result is indicated. From similar 
determinations on other clinical conditions, as e.g., arteriosclerosis, it is shown that valuable information 
about the degree of deviation in the thickness of wall or elasticity modules might be obtained. 


A Mathematical Theory of Capillary Exchange as a Function of Tissue Structure. Gzorcs W. Scumipr, 

University af Chicago. 

An equation is developed giving the concentration of the venous blood in terms of the concentration 
of the arterial blood, the blood velocity, the capillary permeability, wall area, and density, the diffusion 
coefficient of the interstitial matrix and other tissue parameters. A discussion of the relative influence of 
the various parameters upon the exchange rate is presented. 

Equations are deduced giving the mean extra-cellular and the mean capillary concentrations in 
terms of the concentration of the arterial blood and the various tissue parameters. Tne assumption 
used by some experimenters, that the mean capillary concentration is approximstely equal to the ar- 
terial concentration is shown to be generally invalid. 

Consideration is given to the kinds of experiments which would be useful in testing the validity of 
some of the assumptions and approximations of the theory. 








1953 


P and 
nder- 


rs for 
rting 


ts are 
fore- 
ather 
e ad- 


icted 


crop 
r the 


} are 


two 
hees 
pon- 
the 
ion, 
rts, 
iple 
ilar 
‘ion 


Arh le kel iE etna Miiedel 








BOOK REVIEWS 


Facts from Figures. M. J. Moroney. Baltimore, Maryland: Penguin Books 
Inc., 1951. Pp. 472. $1.25. 


M. A. Girsuick, Stanford University 


HAVE used this book as a text in an introductory course in statistics. I 

have also had time to think about its value as a popular treatise on statis- 
tical inference. The result is both enthusiasm and disappointment. 

Moroney has written a truly remarkable popular exposition on what 
might be called classical statistics, and, in the process, almost caught the 
modern spirit as well. The book touches on practically all standard statistical 
techniques, ranging from graph construction to analysis of variance and 
covariance. No effort is spared by Moroney to make the techniques available 
to the reader. With each new technique he gives step by step computational 
procedures so that the mathematically untrained can more easily follow the 
meaning of the formulas and symbols. Because he has not entirely succeeded 
in eonveying to the reader the modern concepts of statistical decision making, 
however, his book does not fulfill the need for a good elementary text in 
statistics. Neither does it manage adequately to impart to the intelligent 
layman and to scientists in other fields the fundamentals of the modern 
theory of statistics. 

There are, in my opinion, at least two prerequisites for writing a popular 
treatise on any scientific subject. One is a good style and, if possible, a sense 
of humor of the kind that Moroney possesses. The other is a deep understand- 
ing of the subject. No other form of writing tends to expose conceptual! weak- 
nesses as glaringly as non-technical expository writing. This is particularly 
true in a new field such as statistics where the concepts are still fluid. In 
fact, one could venture a guess that no statistician ever learns how much of 
his statistical knowledge is nebulous until he attempts to write a popular 
version of what statistics is or give an elementary course covering the 
fundamentals of statistics. 

Style is not Moroney’s difficulty. One is almost envious of the ease with 
which some complicated statistical concepts get unfolded and explained. His 
humor is delightful. He employs ridicule effectively, but not offensively, 
since it is seldom directed against statistics, but rather against its misusers. 

A clear understanding of modern statistical concepts does appear to be a 
problem to Moroney. In this respect, he seems to possess a split personality— 
an ailment common to many statisticians. One part of him is the industrial 
statistician. In this capacity he portrays with lucidity and insight the 
main features of statistics as a guide to action and a tool in decision making. 
The other part of him is the classical statistician. In this role his portrayal 
of statistical ideas becomes somewhat muddy. Here, by and large, decision 
making gives way to a ritual known as performing tests of significance. As 


645 





646 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


an industrial statistician he considers decision rules which lead to the 
acceptance or rejection of lots. As a classical statistician his decision rules 
lead to the acceptance or rejection of Null Hypotheses. Since, clearly, an 
hypothesis is conceptually on a higher plane of being than a mere lot, there 
is less of the know-how and down-to-earth flavor in his discussion of tests of 
hypotheses than is found in his discussion of acceptance inspection. Again, 
as an industrial statistician he insists that a rule, such as an acceptance 
inspection plan, must be evaluated by its Operating Characteristics. Not so 
in testing hypotheses. Here he no longer tells us to inquire what the conse- 
quences of a particular decision rule will be as a function of the possible 
states of nature. Instead, he recommends that we ascertain, by consulting 
an appropriate table, whether the result of the statistical test is “ . . . ‘Prob- 
ably significant,’ ‘Significant,’ or ‘Highly significant,’ depending on the 
probability level associated with the judgment” (page 218). This jargon is 
odious to him also—but not, unfortunately, the underlying idea, since 
admittedly he has no substitute for it. His way out is to claim that “... there 
can never be any question, in practice, of making a decision purely on the 
basis of a statistical significance test. Practical considerations must always 
be paramount” (page 218). But shouldn’t the practical considerations be 
incorporated in the statistical test to begin with? 

At this point it is probably clear that the criticism I am making is in 
reality directed at every elementary statistics book on the market. And most 
such books do not treat classical statistics half as competently as Moroney 
does in Facts from Figures. It is unfortunately true that the material going 
into most elementary statistics texts lags behind the development of statisti- 
cal theory by a decade or more. It is high time that writers of elementary 
texts become aware of and begin to emphasize the following fact about 
statistics which is common knowledge to many. All branches of statistics, 
not just acceptance inspection and sampling surveys, deal with the same 
basic problem, namely, the problem of decision making in the face of un- 
certainty. All decision rules, not just acceptance inspection plans, must be 
evaluated by their consequences. These consequences are expressible in 
terms of risks, or more intrinsically, in terms of the probabilities of taking 
the various permissible actions which are induced by the experiment, deci- 
sion rule, and the possible states of the system. In brief (and with due 
apologies to Moroney), not facts from figures but rather decisions from 
observations should become the main emphasis in elementary statistical 
observations. 

The insistence that we discuss and display, as ingredients in a decision 
making situation, the unknown states of nature, the possible experiments, 
the available decision rules, and consequences of these rules as a function 
of the unknown states is, in my opinion, a primary prerequisite for any 
intelligent approach to statistics. I am convinced that had Moroney been 
aware of this and understood it he would have written the elementary book 
that is needed. In addition, he would have avoided many fundamental 





BOOK REVIEWS 647 


conceptual mistakes as, for example, confusing parameters of distributions 
with statistics (Chapters 4 and 5), performing a two-sided test of a hypothesis 
when a one-sided test is called for (pages 222 and 228) or explaining the exist- 
ence of two regression lines by the fact that “... when we estimate y 
from a given value of z, it is the sum of the squares of the discrepancies in y 
which has been minimized. When we estimate z from y, it is the sum of the 
z discrepancies which have been minimized.” (Presumably, we need only to 
change our method of estimation in order to abolish the existence of the 
two regression lines.) One could take issue with Moroney on many other 
statements found in the book, but I believe they all flow from the same 
fundamental conceptual weakness. 

Nonetheless, this little book is a joy to read and can be highly recom- 
mended to mature statisticians for the sheer fun of reading it; and to students 
for background material. 


An Introduction to Statistics. Charles E. Clark (Associate Professor of Mathe- 
matics, Emory University, Georgia). New York: John Wiley & Sons, Inc., 1953. 
Pp. x, 266. $4.25. 


Z. 8. Matinowsk1, University of Connecticut 


O° of 218 pages of text approximately 18 pages are devoted to concepts 
of descriptive statistics before a final chapter of 30 pages on simple 
correlation. The only subject matter taken up in the field of descriptive 
statistics is the frequency distribution, the arithmetic mean, the standard 
deviation, the histogram and correlation. Questions and answers help to de- 
velop the frequency distribution and histogram more completely but prob- 
ably not enough to prepare the student adequately to criticize the mid- 
values in the table used on page 140. These aspects of descriptive statistics 
(except for correlation) are introduced only because they are used in the 
text to develop statistical inference. In the final chapter on correlation, how- 
ever, inference is not emphasized at all. No mention is made of the reliability 
of the coefficient of linear correlation nor of the standard error of estimate. 
This seems unfortunate in a text which is so obviously an introduction to 
atatistical inference. 

This text is, then, comparable to S. S. Wilks’ Elementary Statistical 
Analysis and Dixon and Massey’s Introduction to Statistical Analysis. 
For the instructor who feels justified in the complete abandonment of the 
median, mode, quartiles, percentiles, ogives, graphic analysis other than the 
histogram, significant digits and rounding, sources of data, index numbers 
and time series, the text is an adequate one, provided the teacher guides 
the student carefully past a few of the difficulties mentioned below. Except 
for these, the explanations are fairly consistently lucid and the development 
well organized. Especially in the first sections of the book (on permutations 
and combinations, the histogram, and linear interpolation), both the prob- 
lems and the answers play an excellent and essential role in the exposition. 





648 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


(Odd-numbered problems are answered in an appendix of 34 pages.) Although 
the author maintains some of the format of a mathematics text (theorems 
and proofs), the presentation is essentially non-mathematical—within the 
grasp of the student who has not had college algebra. Computation tech- 
niques receive probably a basic minimum of space. 

This is a text which can reasonably be covered in a one-semester course in 
statistics. Professor Clark does not attempt to cover completely any more 
than sample means, sample proportions, and differences between sample 
means and sample proportions. All of this is presented in very great detail 
in the first 154 pages. The order of presentation of these concepts is as fol- 
lows: introduction to statistical inference (4 pages) ; permutations and combi- 
nations (8 pages); probability (30 pages); frequency and probability distri- 
butions (48 pages); the reliability of sample means and probabilities (48 
pages); the significance of the difference between two sample means or per- 
centages (15 pages). After this very detailed development, the author men- 
tions analysis of variance (14 pages) and chi-square (16 pages) primarily, 
it seems, to emphasize the limitations of the techniques already developed. 
Even with this limited objective, an adequate reason why the variance be- 
tween samples is comparable to the variance within samples could have 
been presented. Assumptions are not mentioned in these 14 pages on the 
analysis of variance. 

Here are a few more specific points which impressed this reviewer and in 
which the prospective user of the text might be interested. 

In his chapter on probability, the author defines three types of probability: 
empirical probability, a priori probability, and statistical probability. The 
definitions are rather carefully set forth and adhered to throughout the text. 
This reviewer prefers to emphasize one definition of probability (somewhat 
akin to what Professor Clark calls “statistical probability”) and to use the 
adjectives “empirical” and “a priori” simply to distinguish between different 
methods of approximating probability. The author’s definition of empiricial 
probability necessitates referring to sample proportions as sample prob- 
abilities. Probably his only failure to use the term “sample probability” 
consistently is in the heading of Chapter 6. The definition of statistical prob- 
ability given on page 19 involves the concept of confidence, which might 
preferably be attached to estimates of probability rather than to the prob- 
ability itself. 

In the text proper (as distinguished from the questions and answers) 
Professor Clark does not introduce the term “null hypothesis” nor the word 
“hypothesis” until the chapter on inferences from chi-square. Instead, he 
limits himself to the development of two-limit and one-limit confidence 
intervals. Unfortunately, the application of confidence intervals to the 
problem of the testing of hypotheses in the case of the normal approximation 
to the binomial, where the sample proportion is directly substituted into the 
formula for the standard error of a sample proportion, can introduce a serious 
error. For example, the illustration in Section 8.2 is incorrect not only for the 








1953 


ugh 
ems 
the 
ch- 


e in 
ore 
ple 
tail 
fol- 
rbi- 
stri- 
(48 
per- 
en- 
ily, 
ed, 


ave 
the 


1 in 


ity: 
The 
ext. 
hat 
the 
ent 
cial 
ob- 
ty” 
ob- 
ght 
ob- 


rs) 
ord 

he 
nce 
the 
ion 
the 
ous 
the 





BOOK REVIEWS 649 


reason stated by the author but because the standard error of a sample 
proportion is computed incorrectly. Instead of a t of 2.3 the use of 1/6 instead 
of 1/12 would have given a ¢ of 1.7. This correction could make a difference 
in the conclusions. 

On page 236, in his answers to problems 11, 13, 15 and 17, the author 
handles this same type of problem more adequately. In fact, problem 17 is 
essentially the same problem (as in Section 8.2) but analyzed properly. The 
answers to these four problems are basically an introduction to the testing 
of hypotheses without any of the usual terminology. A more complete 
presentation (including some exposition of the error of the second kind) in 
the text proper would be preferable. 

In the reading and problems on differences between sample means and 
sample proportions Professor Clark equates the following: “... we can say 
with 99% confidence that the first universe has a greater mean than the 
second universe” (page 145) and “we found that with 99% confidence we 
can say that the two universes involved have different means” (page 149). 
The author is essentially working with only the former concept because of 
his restriction to confidence intervals. In this same chapter, failure to work 
with one-limit confidence intervals yields lower confidence levels than are 
actually applicable. Thus while theorem 6.5 is correct in saying that “we 
can say with c% confidence that the mean of the first universe is greater 
than the mean of the second universe,” the last paragraph in Section 6.6 
on page 150 is incorrect in its interpretation that “with confidence greater 
than 99.7% we make no inference by theorem 6.5.” Actually the confidence 
can be as high as c% plus one-half of (100 minus c)% and should be so stated 
in theorem 6.5. This would make it analogous to theorem 5.17.2 which also 
is a one-limit confidence interval. Theorems 6.8, 6.12.1, and 6.12.2 should be 
comparably adjusted. 

Professor Clark does not give enough space to distinguishing between 
his confidence statements about samples, which dominate the exposition, 
and his confidence statements about populations. From the presenta- 
tion given, it seems unlikely that the student will perceive that the confidence 
statements about population parameters must have different numerical 
limits for each statement (in accordance with the sample results obtained) 
for the given confidence to work out. 

In the nature of more minor criticisms are the following. In the para- 
graph in the center of page 96, the word “bias” is used in reference to random 
sampling error. At the end of this same paragraph there is aa implication 
that stratified samples are not random samples. Problem 5 on page 89 should 
not be introduced without a continuity correction. The correct answer to 
two decimal places is .17 which is also obtained with the proper continuity 
correction. The book’s answer (obtained without the continuity correction) 
is .10. 

Very few of the trivial errors that tend to appear in the first printing of a 
new text were noticed. On page 162, m.=4531 should be m,=4731. On page 





650 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


164, 2(12,500 — 47,777)? should be 2(12,500 —4,777)*. On page 233, the answer 
to the fifth part of problem 7 under Section 3.3 should be 50 instead of .50, 
On page 224, the answer to the third part of problem 1 is incorrect. One 
part of the answer is also left out for this same problem. (In reference to 
all these problems it would help considerably for class- and home-work if the 
parts of problems were identified by letter.) A final minor error noticed: the 
description set forth under the histogram on page 228 does not agree with 
the histogram itself. 


Advanced Statistica! Methods in Biometric Research. C. R. Rao (Professor of 
Statistics, Indian Statistical Institute, Calcutta). New York: John Wiley & 
Sons, Inc.; London, Chapman & Hall, Limited, 1952. Pp. xvii, 390. $7.50. 


RosepiTH Sitereaves, Stanford University 


HE author’s object is “to present a number of statistical techniques, 

keeping in view the requirements of both the student who questions the 
basis of a particular method employed and the practical worker who seeks 
a recipe for the reduction of his data.” In keeping with this purpose, the first 
two chapters are devoted to mathematical theory, the first to the algebra 
of vectors and matrices and the second to probability distributions. The 
next five chapters deal with methods of estimation and tests of hypotheses. 
The last two chapters are concerned with statistical methods in problems of 
classification. By and large, the author assumes that the reader is familiar 
with the fundamentals of probability theory and univariate statistical 
inference. 

The book presents a wide variety of useful statistical techniques. Chapter 
3 treats linear estimation, tests of linear hypotheses, combination of weighted 
observations, tests of hypotheses with a single degree of freedom, analysis of 
variance, theory of statistical regression, and the problem of least squares 
with two sets of parameters. Chapter 4 is devoted to the general problem of 
estimation with discussions of minimal variance estimates, maximum likeli- 
hood estimates, and sufficient statistics. Chapter 5 deals with large sample 
tests of statistical hypotheses, particularly tests based on statistics with 
limiting normal distribution, or a limiting chi-square distribution under the 
null hypothesis. Tests of homogeneity of variances and correlation coeffi- 
cients are given in Chapter 6. Chapter 7 discusses tests of significance in 
multivariate analysis. Two types of tests are presented, namely, tests based 
on discriminant functions where the multivariate problem is reduced to 4 
univariate problem by considering linear compounds of the original variables, 
and tests based on Wilk’s lambda criterion for problems representing mullti- 
variate extensions of univariate analysis of variance. 

Chapters 8 and 9 include, in addition to the classical problem of classifi- 
cation, discussions of the resolution of a mixed series into two Gaussian 
components, the allocation of a number of individuals to two or more 
groups, the problems of optimum selection, and the problem of classifying 
different groups of individuals to form a significant pattern. 





oR 1953 


nswer 
of .50, 
L. One 
nce to 
if the 
d: the 
2 with 


sor of 
ley & 
50. 


iques, 
ns the 
seeks 
e first 
gebra 
. The 
heses, 
ms of 
miliar 
stical 


apter 
rhted 
sis of 
uares 
2m of 
‘ikeli- 
mple 
ith a 
r the 
oeffi- 
ce in 
yased 

to a 
ibles, 
1ulti- 


ssifi- 
ssian 
more 
fying 


00K REVIEWS 651 
B 


Throughout the book, numerical examples, drawn largely from anthro- 
pology, genetics, and general biology, are worked out in detail to illustrate 
the computational procedures involved. In addition, a number of exercises 
and problems are provided for the more mathematically minded reader. A 
list of references is given at the end of each chapter. Although the book is 
addressed to biometric workers, a number of important biometric problems 
are not mentioned. These include problems of probit analysis and the 
general design of experiments, among others. 

The tests given in Chapters 3-7 are generally presented from the classical 
viewpoint of testing a null hypothesis against unspecified alternatives. The 
notion of the power of a test and other concepts of the Neyman-Pearson 
theory are not introduced until Chapter 8, preliminary to the discussion of 
multiple classification problems. In the treatment of the latter problems, 
use is made of the concept of losses associated with various wrong decisions, 
and optimum decision procedures are obtained for a priori probabilities of 
the different alternatives. Rao differentiates sharply between tests of null 
hypotheses and problems of multiple classification and appears to doubt 
the notion that all problems of statistical inference can be given a general 
formulation. He feels that although various attempts have been made to 
build up a general theory, it is difficult to argue whether or not such a theory 
exists. These views are surprising in the light of the recent researches of 
Wald and others. 

The book should prove a valuable source of statistical knowledge for work- 
ers in both theoretical and applied statistics. It is likely that more insight 
into the problems considered in this book could be gained if they were 
treated more generally in the unified framework of statistical decision theory. 
However, this task is perhaps best left for the future, since the development 
of the theory of decision functions has thus far outstripped its application. 


Statistical Methods for Chemical Experimentation. W. L. Gore. New York: 
Interscience Publishers, Inc., 1952. Pp. vii, 210. $3.50 


See the article by C. Daniel, pp. 476-85, in this issue. 


Econometrics. Gerhard Tintner. New York: John Wiley & Sons, 1952. Pp. xiii, 
370. 


DantEz B. Suits, National Bureau of Economic Research 


HE testing of economic hypotheses and the measurement of theoretically 

meaningful economic relations is a science in which direct experiment is 
virtually impossible. Empirical economic research must rely on data derived 
from such facts as the actual operations of the economy happen to generate, 
and can test and measure only in terms of such situations as have arisen. 
The result is rather special limitations on the research techniques which 
can be employed, and method appropriate to any given problem, like gold, 
is where you find it. 





652 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


The quantity and quality of tools available to the would-be prospector jn 
this field have greatly increased in the last two decades or so. Thus, although 
Tintner has provided a nontechnical introduction to econometrics as Part | 
of his book, it is his primary purpose to collect and present a wide selection 
of these newer methods for the economic researcher. 

The technical portion of the book begins with Part II, “An Introduction 
to Multivariate Analysis” which includes such items as discriminant analy. 
sis, canonical correlations, the treatment of errors in the variables and 
certain problems of identification. Part III, “Some Topics in Time Series 
Analysis,” includes discussions of trends and seasonal adjustments, auto- 
correlation and stochastic processes, and transformations of time series 
data. Except for some of the illustrative examples there is no claim to 
originality. The topics are well chosen and the discussion is accompanied 
throughout by ample reference to sources and collateral material. 

Tintner has done good service in collecting these scattered techniques, but 
the over-all organization of his presentation is not well thought out from 
the point of view of the usefulness of the volume as an aid to the economic 
researcher. 

The organization is based upon statistical topics rather than research 
problems. The result is that special cases of generally related research prob- 
lems are given widely separated treatment, often as coordinate topics, 
with inadequate attention to the relationships among them. Indeed some 
of the cross references are more confusing than helpful and point away from, 
rather than toward, the nature of the relationships. For example, the 
problem of identification as discussed in Chapter 7 (pp. 154-84) gives the 
reader an impression of complete generality, although in fact the conditions 
given there, that a particular member of a system of stochastic equations 
be identified, do not apply to recursive systems, where the problem of identi- 
fication does not arise. However, when the latter are discussed under the 
heading “Stochastic Difference Equations and Process Analysis” (Section 
10.3.7, pp. 275-277), the reader is referred back to Chapter 7. Moreover the 
reader who attempts to apply the conditions of Chapter 7 to the four equa- 
tion recursive system of Section 10.3.7 will find that, while two of the equa- 
tions of the system appear to be under-identified by this test, no mention or 
justification is given the seeming contradiction. Again, a diagonal recursive 
system, under the heading “Systems of Stochastic Difference Equations” 
(Section 10.3.4, pp. 267-69) is treated as if it were essentially different from, 
rather than a special case of, a triangular system. This impression is rein- 
forced by the forward reference concluding the section: “ . . . [The illustra- 
tive example given] . . . can evidently be considered only as purely descrip- 
tive, and the individual equations cannot be identified with meaningful 
economic relationships. ... A method which is based on the idea of process 
analysis, where the individual equations have definite economic meaning, 
will be presented in section 10.3.7” (p. 269). The clearly-suggested difference 





BOOK REVIEWS 653 


in method is, of course, mistaken. The real difference in Tintner’s treatment 
of the two topics lies in his selection of illustration. 

The organization of the book has another unfortunate by-product in that 
it gives rise to no occasion on which to discuss certain cases which do not fit 
the categories used. The problem of identification as treated in Chapter 7 
precedes the time series discussion, hence no lagged variables are included 
in the treatment. In the section on time series, however, the treatment is 
limited to complete recursive systems—i.e., those in which there are no 
exogenous variables. Moreover, in neither case is there a discussion of the 
role of exact relations among the variables—e.g., definitions—although 
such a discussion is certainly in order. Without the advantage of a more 
general discussion, the reader who is faced with a recursive system which 
also contains exogenous variables, or a nonrecursive system with lagged 
variables, or with a system containing a definitional relation is left in the 
dark. 

A few well placed pages devoted to the nature of stochastic systems in 
general and the various special cases frequently encountered would have 
contributed both greater coherence and wider applicability to the work. 

Quite apart from the question of organization, the presentation frequently 
leaves the reader with little understanding of the kind of research problems 
to which a given topic might be applicable. He must rely largely on the 
illustrative examples for guidance. Some of these leave nothing to be 
desired. Indeed, Tintner’s own application of the method of weighted re- 
gressions to test the hypothesis that the demand and supply of British 
labor are homogeneous functions of order zero represents econometrics at its 
best. The economic theory at issue is discussed, the problem is formulated, 
the test is made and critically evaluated. 

On the other hand, the examples are in some cases obscure and are fre- 
quently so removed from a research context as to be meaningless. This is 
particularly notable in Part III where the same time series are used over 
and over, subjected without regard to aptness to whatever manipulation 
the current topic may require. Thus the American meat consumption series 
is involved in stochastic systems to illustrate the just-identified equation 
(pp. 168-71), and the over-identified equation (pp. 177-84). It is fitted with 
a cubic trend (pp. 195-98), and is subjected to Fourier analysis (pp. 220-27). 
Its correlogram is analyzed to test for hidden periodicities (pp. 225-27), 
to test whether the series might be represented as a moving average of a 
stochastic variable (pp. 290-92), and to test whether a second order differ- 
ence equation is satisfied (pp. 298-99). It is again analyzed as a difference 
equation of second order (pp. 262-63), and of third order (pp. 267), and as a 
difference equation with errors in the variables (pp. 274-75). It also illustrates 
the variate difference method (pp. 320-23). In addition this series is used in 
regression equations with other variables on a number of occasions. 

This is all done without regard to the purpose of research or the meaning- 





654 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


fulness of any particular application. If it does not confuse, it certainly 
does not help the prospecting researcher, who may well wonder, for example, 
what to make of the conclusion that “it is not impossible that [the deviation 
from a cubic trend of] the American meat consumption series follows 
stochastic process of the type of moving averages” (p. 293). 

Even the more or less mechanical aspects of presentation leave a _reat 
deal to be desired. The exposition is sometimes so abbreviated that it would 
be difficult for a reader not already familiar with the material to see the 
point. In Section 10.3 (pp. 269-72) for example, the reader is plunged into 4 
discussion of distributed lags as treated by Roos in his Factors Influencing 
Residential Building, without having been told explicitly what distributed 
lags are. Moreover the rationale of Roos’ fairly complicated economic model 
is left obscure to the point that not all the variables in the system are defined, 
Again, the description of the use of orthogonal polynomials in trend fitting 
(pp. 190-98) is carried out without a clear explanation of what they are. 
And, although more than ten pages are devoted to the exposition and appli- 
cation of a method for obtaining consistent estimates of the parameters of 
a single over-identified equation (pp. 172-84), no motivation is given for 
the manipulations, nor is the question of why over-identification is a problem 
ever raised. 

In spots the text is carelessly worded. Thus, the necessary and sufficient 
conditions that a given equation in a stochastic system be just identified 
(p. 167) are first scrambled together in a non sequitur before being straight- 
ened out in the following paragraph. Finally, the number of misprints, some 
occurring in functions and equations, is astonishing. 

The unfortunate conclusion is that Tintner’s book will best serve those 
already reasonably familiar with econometrics and econometric method 
who want a catalog to the literature. It is not a reliable and useful guide to 
the prospecting economic researcher. 


The Theory of Linear Estimation. M. V. Jambunathan. Bangalore, India: India 
Book Company, 1951. Pp. vi, 84. Rs. 3/-. 


Wituram G. Mapow, University of Illinois 


S FAR as it goes, this is a neat little book, but it suffers from a major 
defect. It omits material that is needed today by any person who wishes 
to use the subject matter of the book. 

Before discussing the lacks of the book let us briefly outline its contents. 
It is in two parts. Part I (pp. 1-44) deals with linear estimation. As here 
defined, this is the estimation of a linear function of parameters of which the 
expected values of independent random variables are linear functions. The 
best unbiased estimate is obtained and its variance is derived. Simple 
algebraic tools are used. Part II (pp. 47-83) is entitled “Testing of Hypothe- 
sis.” After making the usual normality hypotheses, the usual tests of signifi- 
cance are obtained. Again, it is concisely done. 





DR 1953 


‘tainly 
1mple, 
ration 
lows 


Teat 
would 
ce the 
into 4 
encing 
buted 
mode] 
‘fined, 
fitting 
y are, 
appli- 
ers of 
on for 
oblem 


icient 
tified 
night- 

some 


those 
ethod 
ide to 


najor 
rishes 


tents. 
here 
h the 
The 
mple 
othe- 
gnifi- 


BOOK REVIEWS 655 


The book does not purport to present new results but only to be a “succinct 
account” of its subject matter. In omitting any discussion of the power 
function and decision theory, the usefulness of the book has been greatly 
reduced. Tang’s fundamental paper is not even mentioned, let alone the 
later work of Wald and others. These omissions are serious since the student 
who reads the book may be led to feel either that no further work has been 
published, or that it is unimportant, or that it has no value in practice. Yet 
recent research, in the analysis of variance as elsewhere, has immediate and 
important practical applications. 

To summarize: What the book does, it does well. But it does not include 
results that are of immense importance. Since no real applications are pre- 
sented, the omissions cannot be justified by any lack of necessity of the 
omitted work in view of the particular applications made in the book. 


Introduction to the Theory of Games. J. C. C. McKinsey. New York: McGraw- 
Hill Book Company, 1952. Pp. 371. 


Irwin Bross, Cornell University Medical College 


— three hundred years ago a French gambler happened to ask a mathe- 
matician about the odds in a dice game that was popular at that time. 
Out of this innocent query was to grow the subject of mathematical prob- 
ability and, in direct line of descent, the topic of mathematical statistics. 
There are still vestiges of the game heritage in modern statistical practice. 
Dice and card games are often used as examples in courses in statistics. 
Occasionally dice or numbers in a hat are used to randomize an experiment, 


and an important modern technique is called the “Monte Carlo Method.” 

Insofar as games of pure chance are concerned, such as honest dice or 
roulette, the mathematical analysis has been quite successful. The mathe- 
matician’s advice concerning odds has been tested by gamblers over a period 
of many years and has been found to be sound. The record of mathematical 
analysis in more complex games such as poker, chess, or bridge has not been 
too successful. For example, it is well known that a person who plays poker 
strictly according to the published tables of odds will lose to a good poker 
player. 

The failure of the earlier mathematical analyses of games such as poker 
was due to the omission of an important element—strategy. In poker or chess 
or bridge the player has personal choice and can derive benefit from an in- 
telligent line of play. 

The first comprehensive attack on the problems of games of strategy was 
made by J. von Neumann in 1928. As the title Theory of Games and Economic 
Behavior would indicate, von Neumann’s pioneer book (with Morgenstern) 
was intended to apply not merely to parlor games, but also to economic 
situations. 

J. von Neumann’s original work has been extended and elaborated into 
a new sub-field of mathematics called “game theory.” A very excellent ac- 





656 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


count, which is concerned “almost entirely (with) the purely mathematica] 
aspects of the theory,” is provided by McKinsey’s Introduction to the Theory 
of Games. The first chapters of the book (Chapters 1 through 8) are about 
at the level of a B.A. in mathematics. Considerably more advanced mathe. 
matics are utilized in the latter chapters. Clearly written and well planned, 
the book provides a very readable discussion of game theory (including more 
recent advances). 

Statisticians might be surprised to learn that Chapter 13 of McKinsey’s 
book is titled “Applications to Statistical Inference.” This raises the ques- 
tion: Is game theory of importance to statisticians; that is, will many statis- 
ticians benefit by learning about game theory? 

My answer to this question is in the negative, although I enjoyed person- 
ally reading this book and I would recommend it strongly to anyone who 
wishe: to learn about game theory. 

My objections to game theory do not concern the mathematics, but 
rather the basic ideas of this field. The essentially new step taken by game 
theory was to bring strategy into the mathematical picture. Thus if two 
players A and B were engaged in a game, the analysis would have to consider 
their respective strategies. Now evidently A’s strategy is going to depend on 
B’s, and B’s strategy will in turn depend on A’s, so this gets into a merry- 
go-round. What is worse, all sorts of psychological considerations enter the 
picture, for A’s strategy depends on what A thinks B’s strategy will be, and 
so on. This psychological interaction is, of course, the heart of any game of 
strategy—it makes games fun. 

The approach chosen by von Neumann provides a very elegant mathe- 
matical formulation for game theory and gets rid of the messy psychological 
issues. However, the procedure comes perilously close to “throwing out the 
baby with the bath water.” What is done, basically, is to replace the two 
players by two computing machines or robots, and to give these robots 
special instructions. The gist of these instructions is: “Maximize your 
minimum expected gain.” 

The robot game described above would seem to be an appropriate model 
for intellectual games with high caliber opponents. On a chess game, for 
example, a player might very well maximize his minimum expected gain. 
Thus player A would consider his available moves and for each move try to 
envisage the best countermove by player B (i.e., A’s minimum expected 
gain). Player A would appropriately select as his own move the one which 
gave the most advantage even against the best defense. 

On the other hand, for many parlor games (and real life games) the robot 
model is of dubious utility. In actual games, player A should study player 
B’s style so as to take advantage of B’s mistakes. In game theory a player 
would use the same strategy against dub or expert. 

The association between game theory and statistics arises from the fol- 
lowing analogy. Suppose now that player A is a statistician. Suppose also 
that player B is “nature.” The statistician makes the first “move” in this 





BOOK REVIEWS 657 


game by doing an experiment. Nature’s answering “move” is a set of data. 
The statistician’s next “move” is to examine the data and make some decision 
(i.e., reject a shipment of parts). If the statistician makes the wrong decision 
he pays a penalty (which may depend on the extent of his “error”). 

McKinsey recognizes that “nature cannot properly be conceived as trying 
to outwit us,” but suggests that “the player may be interested in determining 
what is the worst nature can do to him.” McKinsey then asserts, “Situations 
of this sort arise particularly in connection with statistics.” To buttress this 
statement McKinsey gives three examples, one of which is “to maximize the 
accuracy of the determination of a quantity for a given cost.” These exam- 
ples serve only to refute McKinsey’s assertion, for they are all “pure maxi- 
mization problems in the classical sense” where McKinsey himself admits 
“there is no question of countering the moves of another rational creature.” 

After this unpromising start, McKinsey proceeds to give a very simplified 
version of a problem in public opinion sampling. “A certain urn is known 
to contain iwo balls, each of which is either black or white. A statistician, S, 
wishes to make a guess as to how many balls are black.” Suppose that if S 
guesses right he receives $100. If he misses by one he receives nothing. If 
he misses by two he must pay out $100. S may inspect one ball (or both), 
but each inspection costs him $50. 

It should be noted that this is not a problem in statistical inference. If one 
ball is inspected (i.e., the sample is taken), there is no attempt to use the 
the sample to make inferences about the population. Indeed, the example is 
such that one ball provides no information about the other ball. What has 
happened is that the statistical problem has been “simplified” out of exist- 
ence. 

Be this as it may, it is instructive to consider the game theory solution 
to the problem. The statistician is advised to behave as follows: he tosses a 
coin. If the coin shows heads, he announces that one ball is white and one 
is black without taking a sample. If the coin shows tails he examines one ball 
and guesses that both balls are of the same color as the one tested. 

The reason why the game theorist’s advice to the statistician is so queer 
is not hard to discern. The game theorist says in effect: “Don’t look at data 
or past experience (i.e., don’t try to learn nature’s strategy); consider the 
worst that nature might do instead of what nature is likely to do.” 

This advice hardly makes sense unless the statistician believes that the 
world is, quite literally, against him. 


A Theory of Psychological Scaling. Clyde H. Coombs. University of Michigan 
Engineering Research Institute Bulletin No. 34. Ann Arbor: University of Mich- 
igan Press, 1952. Pp. vi, 94. $1.75. Paper. 


Bert F. GREEN, Massachusetts Institute of Technology 


HE scaling problem considered in this monograph is that of accounting 
for the observed interrelationships among a set of qualitative variables 
by relating them to one or more hypothetical “underlying” variables. Pro- 











658 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


fessor Coombs presents a general conceptual rationale for many of the 
available psychological scaling methods. 

The monograph begins with a general discussion of the theory of measure- 
ment. The logical properties of various types of scales are discussed with 
special emphasis on scales based on partial orderings of the objects being 
scaled. Professor Coombs introduces his approach to the scaling problem 
by suggesting two systems of parameters. “The genotypic system refers 
to an inferred, hypothetical, latent, underlying basis of behavior. The pheno- 
typic [system] is the manifest, the observed level of behavior.” The scaling 
problem is “to study the information contained in a set of phenotypic obser- 
vations to determine what can be inferred about the genotypic level.” 
Two genotypic variables are defined. Qi; is the measure of a stimulus, j, 
on some attribute for an individual, 7, at the moment, h. C)i; is the measure 
of an individual, i, on some attribute of a stimulus, 7, at the moment h. 
The phenotypic variable, which is a psychological magnitude subject to 
observation, is defined as Pxi;=Qnizj—Criz. A set of postulates is provided 
that relates these variables to some of the typical rating procedures used 
in scaling techniques. For example, for fixed h and 1, the judgment “Stimulus 
jis preferred to stimulus k” is represented as| Pyi;| < | Paix| while the judgment 
“Stimulus j has more of (some attribute) than k is represented by Pais Prix. 

Next, the genotypic and phenotypic parameters are defined. These 
parameters are the conceptual components of variance of the Q’s, C’s, and 
P’s. For example, the parameters based on Q, for each item j, are the variance 
of the Q’s within individuals, i.e., replications, and the variance between 
individuals. The latter is further divided conceptually into the variance 
accounted for by controlled factors, and the residual variance between indi- 
viduals. Analogous definitions are given for parameters based on the C’s and 
the P’s. The data obtained in a scaling experiment are to be classified 
according to which variance components are zero. After the general theory 
has been presented, it is related to two specific scaling methods, Coombs’ 
unfolding technique and the method of paired comparisons. 

The theory is not presented in an attempt to unify the field of scaling. Its 
purpose is to give a sound logical basis for certain types of scales—especially 
scales concerned with ordinal relationships. For example, Guttman’s scalo- 
gram technique may be encompassed by the theory, whereas Lazarsfeld’s 
latent structure analysis cannot be treated adequately. Professor Coombs 
voices a prejudice against stochastic models for scaling. He believes that the 
use of statistical concepts in scaling models “is to build an actuarial science 
at the possible cost of a science of individual behavior.” This reviewer feels 
that many of the factors influencing attitudes and preferences are at present 
uncontrollable; any attempt to make a detailed study of the individual case 
in the face of a large error variance seems optimistic. However, the ultimate 
worth of a general theory should be judged by its utility in consolidating a 
number of special techniques, and in suggesting new areas for investigation. 
In this regard, it may be noted that the monograph is to some extent a status 





eli ce Kir CaN i Pa we! 





the 


ure- 
vith 
eing 
lem 
fers 
Pno- 
ling 
ser- 
rel.” 
sj 
sure 
t h. 
t to 
ded 
ised 
ulus 
1ent 
hike 
hese 
and 
ince 
reen 
ince 
ndi- 
and 
fied 
Ory 
nbs’ 


Its 
ally 
alo- 
Id’s 
mbs 
the 
nee 
eels 
ent 
‘ase 
ate 
ig a 
ion. 
tus 


pas. iil lta net. 





BOOK REVIEWS 659 


report, since it contains many references to theoretical and empirical studies 
now in progress. 

Professor Coombs’ monograph is written for a special professional audi- 
ence. However, even specialists in the field will find that the monograph is 
not easy reading. This is due in part to an excess of indigenous jargon. Sec- 
ondly, concrete examples are used sparsely. It would have been extremely 
helpful to have more specific instances of the conceptual theory. The ex- 
amples provided in the last two chapters are helpful, but not sufficient. 

Despite these shortcomings, Professor Coombs’ work is an important 
contribution to the theory of psychological scaling. Workers in the field of 
scaling will find many interesting and stimulating ideas in this short mono- 
graph. 


Effective Management through Probability Controls: How to Calculate Man- 
agerial Risks. Robert Kirk Mueller (Assistant General Manager, Mosanto 
Chemical Company, Plastics Division). New York: Funk & Wagnalis Company 
in association with Modern Industry Magazine, 1950. Pp. xvi, 310. $5.00. 


Two Reviews follow: 


J. C. Bain, Associated Merchandising Corporation 


ms aims of this book are: First, to cite enough examples to prove the 
case that statistical control utilizing the law of probability is really a 
technique for modern management; second, to show that this technique can 
be comprehended by personnel at the executive level who may not have a 
background of mathematics, statistics or technical training; and third, to 
show that the opportunities for applying this management tool are not 
confined to manufacturing alone. 

The author attempts to achieve these aims in five sections: I. How to make 
the most of the significant—introductory in character; II. Why executives are 
interested in statistical probability—an account of benefits to executives and 
case histories; III. Brass hat facts about statistical control—operational 
benefits and a once-over-the-field of statistics very lightly, especially statisti- 
cal quality control; IV. “But my business is different”—applications other 
than to manufacturing and misapplications; V. Topside responsibility and 
participation in a control program—organizing a program. 

Among the good features of the book are its illustrations of charts and 
demonstration equipment, many of which are quite apposite, and its numer- 
ous references to mathematical history, recalling a feature often omitted 
entirely from an education in mathematics. 

An extremely long list of examples of statistical quality control in various 
companies provides suggestions for the idea hunter. Perhaps the most valu- 
able feature is a detailed documentation of the installation of a system in 
the Monsanto Company which, intentionally or unintentionally, brings home 
the fact that it is much more of an undertaking than many executives imag- 
ine, 





660 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


The fundamental thesis of the book is one with which few will quarrel, viz, 
statistical quality control is a good thing. This might serve as a summary of 
the book in one sentence. It is doubtful if it requires three hundred and ten 
pages to put it across to the reader. The book succeeds in the first and third 
of its declared aims but in respect of the second the term “comprehend” must 
be taken in a wholly superficial way. 

Against these virtues must be reckoned deficiencies which are numerous 
and obvious. 

The title is a patent misnomer. It is a matter of extreme doubt whether 
statistical quality control as here outlined is what the august body referred 
to as “management” will identify as a typical managerial risk. Managements 
notoriously identify themselves far more with the problems requiring experi- 
mental design, for instance. One will not be able, after reading the book, to 
calculate a managerial risk or any other kind of risk for the reason that one 
is never told how to do it. 

The statistical content is quite trifling, being purely descriptive. A descrip- 
tion of “factorial experiments,” for instance, is illustrative. It requires one 
sentence: “The solution of the problem is a technical one concerned with 
chi-square values, analysis of variance, degrees of freedom, interaction resid- 
uals, and many other of the mathematical aspects of such work.” 

Here the superficiality to be attached to “comprehend” in the second aim 
of the writer is apparent. About the most the executive who reads the book 
could hope for is to recognize an occasional term if he ever went to a statisti- 
cal meeting. Of course, “Addition, subtraction, long division and an occa- 
sional square root thrown in is about all that is needed for an executive to 
become reasonably familiar with statistical techniques.” 

A conventional view of the role of the executive as one “who may super- 
impose his basic judgment” must seem, to some readers, to be strangely at 
variance with “the scientific approach to management (which) is replacing 
management by Indian-medicine-man methods.” 

One is repeatedly warned against the evils of having on one’s staff a 
“mathematician or statistician who is interested in mathematics only for 
mathematics’ sake.” Surely the advice is gratuitious, but one can fairly hear 
the mind of a certain type of reader clicking out an ugly conclusion. At the 
same time “It is also advisable to have someone on the staff qualified to han- 
dle the more mathematical aspects of the latest statistical techniques.” 

The numerous examples might well have been greatly reduced in number. 
The consequent demands of brevity beget obscurity, oftener than not. 

The most serious criticisms are two in number: Is the type of appeal effec- 
tive? and Is statistics put in a proper light? 

With respect to the first, there is a deplorable implication that the proper 
way to sell the idea to the management is by cajolery and flattery, although 
there are probably some who will buy, and that a smattering of ignorance is 
all one needs to run a quality control installation. In addition, it is fact of 
experience that managements do not regard promised savings as a commen- 





R 1953 


l, Viz, 
ary of 
id ten 
third 
must 


erous 


ether 
ferred 
nents 
xperi- 
ok, to 
at one 


scrip- 
S one 

with 
resid- 


1 aim 
} book 
tisti- 
occa- 
ive to 


uper- 
aly at 
acing 


aff a 
ly for 
hear 
t the 
han- 


nber. 
effec- 


roper 
ough 
ice is 
ct of 
men- 


BOOK REVIEWS 661 


dation of their own past performances and are more attracted by promises of 
production and freedom from trouble. 

With respect to the second, the executive lingers indefinitely, lacking sharp 
distinctions: (1) between statistics (singular) and statistics (plural); and (2) 
between statistics as a technique applied to a type of phenomenon and statis- 
tics as a technique applied to masses of numerical information. The complete 
and ill-founded assurance which most executives feel about their knowledge 
of the second element in each of these distinctions will certainly becloud 
their perception of the first. 

One is conscious of an intensification of the need for a short, clear and pre- 
cise work on statistics for executives but the talent which produces such things 
is rare. One cannot feel that the advice given in this book is either as good or 
as articulate as “Hire a good statistician and relax.” 


Pau 8S. OtmstTEaD, Bell Telephone Laboratories 


HE wide variety of applications of statistical quality control discussed 
Tin this book may be of interest to some statisticians. The book also con- 
tains a number of charts and photographs that illustrate how particular 
features of SQC may be presented convincingly to management. 

Unfortunately, the book gives ample evidence that the author is confused 
about the present status of SQC. This is in part apparent from the titles of 
the five sections in which the book is divided: I How to make the most of the 
significant; II Why executives are interested in statistical probability; III 
Brass hat facts about statistical control; IV “But my business is different”; 
V Topside responsibility and participation in a contrel program. The author 
seems afraid to use the term, Statistical Quality Control, that has been ac- 
cepted so generally by management. Instead, he makes an unconvincing 
attempt to bring in a new term, Probability Control. This has no real mean- 
ing for an executive whose primary interest is the best quality for the least 
cost. 

This book is not recommended reading either for the busy executive or for 
the beginner in SQC. 


Factors Affecting the Demand for Consumer Installment Credit. Avram Kissel- 
goff. New York: National Bureau of Economic Research. 


GEorGE E. O’RourkKeE 


 yinionertig paper should command the attention of the professional 
economist and statistician, both because it deals with a timely subject, 
and because it is an example of the application of the econometric method 
to a specific problem. We would suggest that its methodological significance 
outweighs the importance of the conclusions drawn. 

The statistical analysis of the demand for instalment credit casts into 
sharp focus the difficulties which beset one who attempts to utilize a method 











662 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


which combines both the theoretical and the statistical approach in treating 
economic data. The author made use of two statistical techniques in esti- 
mating the parameters of the structural equation which determines the 
demand for instalment credit. The first of these considers the equation as a 
part of a general system, and estimates the parameters from reduced form 
equations. This method leaves the estimates free of bias, but complicates the 
computational difficulties, restricts the number of variables which can be 
handled, and raises the identification problem. As an alternative, Kisselgoff 
estimated the parameters by the use of multiple correlation, and ignored the 
fact that the equation for the determination of instalment credit is a part of 
a much larger and more complex system. 

The author feels that the bias introduced by the use of the single equation 
approach may not be significant for the type of study he is undertaking (p. 
29). This reviewer is inclined to agree with him. While the bias is there, it is 
certainly insignificant when compared to the inexactitudes caused by the 
crudities of the data, the exclusion of a large number of important factors 
because of the limited number of observations, and the undeniable fact that 
the structure itself changes over time, and in a way which can only roughly 
be accounted for in a trend term. Kisselgoff’s restriction of the period of 
study to the pre-war era is brought about by a recognition of these short- 
comings. 

The need to rely in large part on the historical and institutional approach 
in economic investigations cannot be eliminated by the wholesale adoption 
of the econometric approach. The author gives implicit recognition to this 
fact when he refers to the stimulating effect of the veterans’ bonus on instal- 
ment credit demand, and the restrictive influences of the introduction of 
Regulation W (p. 42). 

One could perhaps suggest that the study should have included other ex- 
planatory variables, and should have been extended into the post-war period. 
However, these shortcomings, if such they be, may be attributed to the un- 
availability of data, and to the amount of personal judgment involved in 
selecting the variables to be considered as relevant. 

The results of Kisselgoff’s analysis, although not startling, are at least 
reassuring to those who on a priori grounds would have indicated a relevance 
for those factors which he finds significant. The relative importance of cur- 
rent income as a determinant of the demand for instalment credit is not sur- 
prising, particularly when we consider the high income elasticity of consumer 
durables. The high negative elasticity of demand with respect to the size of 
the required monthly payment is perhaps more revealing, and might prove 
of some interest to those who are responsible for monetary policy. The least 
satisfactory feature of Kisselgoff’s work is his method of accounting for the 
liquid asset effect in a number of models through the level of income lagged 
one year. This reviewer would prefer to see the liquidity element handled 
separately, and more explicitly. 

However, as it stands this paper is well worth the time required for a 





— p> 


ng 
ti- 
he 


1e 
de 
ff 
1e 
of 


== 6 aa -_- 





BOOK REVIEWS 663 


careful reading, and should serve to point out what can be accomplished, and 
what cannot be accomplished in this difficult field. 


Agricultural Policy of the United States. Harold G. Halcrow. New York: Prentice- 
Hall Inc., 1953. Pp. vi, 458. 


Ivan M. Lez, University of California (Berkeley) 


HE material in this book is presented in three parts. Part I is given over 

to a discussion of what the author calls the agricultural setting. Population 
trends and future prospects, trends and relationships in selected subaggre- 
gates of agricultural production, and the behavior of aggregate agricultural 
income over time are summarized briefly in this part. Most of the remainder 
of Part I is devoted to a diagramatic and elementary numerically illustrated 
discussion of selected economic concepts such as supply, demand, elasticity, 
and costs. Part II is very brief, containing the author’s version of a useful 
classification of the objectives of agricultural policy under the headings: 
(1) increasing efficiency, (2) raising and stabilizing farm income, and (3) im- 
proving social welfare. Part III occupies about one-half of the book. In this 
part a wide range of government legislation affecting farmers is discussed 
under appropriately chosen chapter headings. 

In a review of a book which is offered as a textbook in agricultural policy, 
it would seem appropriate to pay some attention to the question of what 
consitutes the field of agricultural policy. The author, in his opening remarks 
in Part III (p. 208), suggests lines along which agricultural policy as a field 
of study might, in the opinion of this reviewer, be fruitfully developed: 
“_.. The student of policy must become a student of economics, sociology, 
and political science, as well as several other subjects if he wishes to obtain a 
broad understanding of the field.” Having recognized the broader aspects of 
the field, the author chooses to restrict his analysis to the much narrower 
point of view of economics. He states (p. 208): “Our emphasis is on eco- 
nomics. Our problem is to recognize the economic and political forces at 
work and to bring economic analysis to bear on the problems under discus- 
sion... . We cannot consider in one book the implications for policy of all 
various disciplines such as philosophy, political science, and sociology... . 
We shall talk about political interests and pressure groups. But we 
shall place our major emphasis on the economic analysis of the programs that 
are formulated to carry out the objectives of policy.” The author’s develop- 
ment is in the main consistent with this stated intention. The tools outlined 
in Part I are selected from those to which the beginning student is subjected 
in an elementary course in economic principles. The objectives of policy in 
Part II are in the main phrased in language which facilitates discussion in 
terms of economic logic. Finally, the analysis of programs in Part III is de- 
veloped primarily along economic lines. 

The author’s interest in narrowing his subject to more manageable pro- 
portions is understandable. On the other hand, the main support for the recog- 





664 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


nition of agricultural policy as a separate field of study would seem to come 
from the desirability of bringing various ideas and techniques from several 
related fields to bear on the subject under analysis. A textbook in the field 
would appear a most appropriate place to develop this kind of an integrated 
approach. 

Considered from the narrower viewpoint of economic analysis, several re. 
marks seem pertinent. First, the level of analysis is quite elementary. In the 
preface the author indicates that the book is designed: (1) to serve readers 
with little previous exposure to economic theory, and (2) to serve as a basis 
for the development of more advanced courses in policy. With respect to the 
first objective, this reviewer is inclined to question whether an economic 
analysis of agricultural policy can be effectively handled at this elementary 
level. A very minimum prerequisite of one course in economic principles 
would seem essential. With this background several chapters included in Part 
I in the present form could be omitted. With regard to the second objective, 
one cannot escape the conclusion that this text would need to be heavily sup- 
plemented in an advanced course. 

A second point concerns the absence of sufficient recognition of some of the 
limitations of the policy researcher’s analytical tools. The elementary stu- 
dent in particular upon reading this book is likely to carry away the impression 
that the analytical tools are a good deal sharper than is in fact the case. This 
applies from both the economic and statistical points of view, but attention 
here is given to the latter. Economic concepts are quantitative concepts. 
A significant element in the analysis of various agricultural programs in- 
volves the estimation of relevant quantitative economic relations. The statis- 
tical theory of estimation of economic relations is admittedly rather involved 
and it would seem inappropriate to suggest that an attempt should have 
been made to treat it systematically in an elementary text in agricultural 
policy. On the other hand, the presentation of a number of estimates of co- 
efficients of demand elasticity as is done in Chapter 6 of this book with no 
caution regarding their tentative character seems equally inappropriate. If, 
such material is to be presented at all in an elementary book of this nature, 
there would seem also to be some obligation to include an elementary exposi- 
tion of the relevant statistical problems of estimation. A defendable alterna- 
tive in the present case would have been to omit this material since the book 
is not in the main quantitatively oriented. At the more advanced level a 
strong case can be made for an integrated treatment of econometric method, 
including the more recent development, in a textbook in agricultural policy. 

Another respect in which statistical methodology deserves some attention 
is in connection with errors in the basic data commonly used in quantitative 
research in agricultural policy. Data on prices, production, employment, 
etc., used extensively by policy researchers, and appearing mainly in chart 
form in the present book, are estimates based on methods which leave a 
cloud of uncertainty regarding the errors of estimation. Those responsible 
for these estimates recognize their fallibility although no measures of error 
are provided as a guide to the user. The methods of estimation employed 





pOOK REVIEWS 665 


depart in important respects from those dictated by sound statistical theory. 
It is not suggested here that the writer of a textbook in agricultural policy 
should be charged with the responsibility of developing measures of error. He 
is, however, under some obligation to recognize the presence of errors as an 
element in even the simplest type of quantitative analysis of the agricultural 
setting and of various agricultural programs. A common quantitative tech- 
nique in agricultural policy involves, for example, comparisons of certain 
aggregates or averages in different segments of the economy or different areas 
within agriculture. Such comparisons might serve quite appropriately to 
suggest frictions and maladjustments in the functioning of the system. In- 
comes per worker in agricultural and nonagricultural employment may serve 
as an example of a commonly employed comparison. Both income and em- 
ployment estimates are subject to statistical errors of estimation. In addition, 
particularly in connection with agricultural employment, conceptual or defi- 
nitional differences account for substantial discrepancies in current estimates 
available (BAE and Bureau of the Census). When various sources of error 
are taken into account, one wonders just how substantial a difference must 
be before it takes on genuine quantitative significance. In the case of in- 
comes per worker significant differences may well remain after allowance 
for statistical and conceptional errors. Other comparisons could be cited 
where this may or may not be the case. The point is raised not to question 
the aggregative comparative technique as a device for suggestive analysis 
but rather to suggest that in the development of agricultural policy as a field 
of study, proper attention to the statistical point of view would seem a con- 
structive innovation. 


Causes of Decline in the World’s Cotton Textile Trade. Osaka, Japan: Institute 
for Economic Research, Toyo Spinning Co., Ltd., 1952. Pp. 48. 


Kart A. Fox, Bureau of Agricultural Economics 


y wen brief study was prepared for the All Japan Cotton Spinner’s Associ- 
ation in connection with discussions at the International Cotton Confer- 
ence held in 1952. The foreword states that “The manuscript in Japanese has 
had the examination and approval of the members of the Japanese delega- 
tion to the Conference.” 

This is primarily an economic analysis. The statistical methods used are 
simple, and the terminology in some passages is more pretentious than de- 
scriptive. For example, a table dividing the total volume of world trade in 
cotton textiles into imports and exports by each of two major groups of 
countries is described as an “input-and-output relation table.” A tabular 
comparison of changes in rayon and cotton consumption is also said to em- 
ploy “the input-and-output analysis.” The only resemblance to input- 
output analysis is that the sums of row and column totals in the tables are 
both equal to the element in the lower right hand corner. 

In some cases it is not clear what countries and time periods are used in a 
given analysis, nor even what method of analysis is used. On page 31, correla- 





666 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


tion coefficients, income elasticities, and price elasticities of demand for cot- 
ton textiles are reported with no indication of the number of observations 
underlying each, or of their standard errors or levels of significance. Price 
elasticities are reported for Japan for the periods 1920-24, 1925-31 and 1920- 
31 as a whole. The price elasticity of cotton textile consumption in 1920-24 
is given as —0.58 and in 1925-31 as —0.69. The income elasticity in 1920- 
24 is reported as 0.43 while that in 1925-31 is reported as 1.27. If these re- 
sults were based (as seems evident) upon multiple regression analyses, the 
analysis for the first period left two degrees of freedom while that for the 
second period left four. The apparent drastic change in income elasticities 
between the two perics is probably not significant. Yet on page 33, the fol- 
lowing inferences are d.awn from these analyses: “In the first half of the peri- 
od in question when income was generally low, price was the more important 
factor, whereas in the latter half of the period when the average income has 
increased, income elasticity was greater than price elasticity, from which we 
learn that income rather than price was the more dominant factor.” 

The elasticity coefficients, of course, do not show which factor was “more 
dominant” during the period in question; furthermore, we would ordinarily 
expect the income elasticity of demand for textiles (in terms of yards of 
cloth consumed) to be smaller at high than at low income levels. An analysis 
of cotton textile consumption in Indonesia during the years 1931-38 is also 
reported and yields a price elasticity of —0.63 and an income elasticity of 
+0.52. Again no standard errors are shown. On page 34 the following infer- 
ence is drawn: “As is readily discernible from these estimates, demand in 
the under-developed countries is more apt to be affected by price fluctuations 
than by changes in national income.” 

The study as a whole gives a specious appearance of carefulness and ob- 
jectivity. It is essentially an economic brief pleading’a special cause, and ad- 
vancing proposals which would be of primary benefit to the cotton textile 
industry of Japan. While it contains more statistics, and perhaps a more 
reasonable interpretation of them, than is common in economic briefs pre- 
pared in advocacy, upon closer examination the statistical analysis is found 
to be extremely weak and the inferences drawn from it largely unwarranted. 
It would be interesting to see what a competent and objective analyst, or a 
team consisting of a foreign trade specialist and a statistician, could do with 
the same basic material. 


A Short Scale for Measuring Farm Family Level of Living: A Modification of 
Sewell’s Socio-Economic Scale. John C. Belcher and Emmit F. Sharp. Stillwater, 
Oklahoma: Oklahoma Agricultural Experiment Station, 1952. Pp. 22. 


Frep L. Stroptrseck, University of Chicago 


pee concerned with the determination of the socio-economic status 
of rural, or non-rural, families will wish to examine this revision of 
Sewell’s 1940 scale. The authors have meticulously correlated the presence 





1 Sewell, William H., The Construction and Standardisation of a Scale for the Measurement of the 
Socio-Economic Status of Oklahoma Farm Families, Stillwater, Oklahoma, April, 1940. 





IR 1953 


Tr cot- 
ations 
Price 
1920- 
20-24 
1920- 
se re- 
8, the 
ir the 
Cities 
e fol- 
- peri- 
rtant 
e has 
th we 


more 
arily 
ds of 
ul ysis 
} also 
ty of 
nfer- 
id in 
tions 


1 ob- 
1 ad- 
x tile 
more 
pre- 
und 
ited. 
or a 
with 


mn of 
ater, 


BOOK REVIEWS 667 


and absence of 29 characteristics in a sample of 825 open-country Oklahoma 
families. The characteristics examined had previously been found to be the 
most consistently discriminative from the set of 123 items used by Sewell. 
The distinctive contribution of the present study is a factor analysis which 
reveals that Sewell’s list contained an economic cluster and a cluster of items 
pertaining to religious participation. The writers accordingly select the items 
from the economic cluster, rework the weights of the alternatives, and then 
present @ brief, easily administered, “level of living” scale. The ten items of 
the short scale treat construction of house, plumbing, lighting and refrigera- 
tion facilities and similar matters. 

From a more general point of view, this study represents a reversal of 
current trends in the sociological analysis of “class” phenomena. Most recent 
efforts have been directed toward creating an easily administered but factori- 
ally complex scale which would maximally reproduce judgments of par- 
ticipants in the community or similar criteria. From the standpoint of broad 
demographic investigation (such as the origins of high level talent or the inci- 
dence of schizophrenia) we are very greatly in need of a status classification 
for agricultural populations which would articulate with those we use in 
urban analysis. Insofar as the present study throws into sharp relief the 
possibility that factorially pure scales with unambiguous items may neces- 
sarily be highly specific in the cultural traits they involve, we are forewarned 
that we may be led further from some of the engineering objectives we seek 
to attain by socio-economic indices if we insist on single factor sub-scales at 
this time. 

Within the limited scope the authors worked, it is to be regretted that no 
systematic investigation was made of the minimum number of items which 
would essentially reproduce their scale. A practical administrator might also 
wish to know the items which are most or least sensitive to level of living 
fluctuations of the type associated with droughts or new farm parity pro- 
grams. It is my feeling that the items they have used would lag well behind 
decreases in income. 


The Labor Force in California: A Study of Characteristics in Labor Force, Em- 
ployment and Occupations in California, 1900-1950. Davis McEntire. Berkeley: 
University of California Press, 1952. Pp. x, 101. $2.50. Paper. 


Guapys L. Patmer, University of Pennsylvania 


, pe Institute of Industrial Relations of the Berkeley branch of the Uni- 
versity of California has broadened the base of its studies of wages and 
collective bargaining problems in a recent study of changes in the labor 
force of the state of California from 1900 to 1950. Professor McEntire’s 
analysis of a half century of changes in the labor force and structure of 
employment in California provides a background for the understanding of 
many labor market problems in “the most dynamic state segment of the 
national labor force.” 

The first chapters discuss changes in population and labor-force participa- 





668 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


tion rates and their effects on the composition and size of the California labor 
force. Later chapters outline long-term trends in the occupational and 
industrial distribution of employment and the impact of the war and defense 
production programs. Racial differentials in labor-force rates and employ- 
ment attachments are also considered. 

These data provide a skeleton structure for research in many labor mar- 
ket problems which can most appropriately be studied in California. For 
example, a major force in the growth of California’s population and labor 
force from 1900 to 1950 has been migration. Although the extent of immigra- 
tion is known to have varied from decade to decade, it would be valuable to 
know how much its character has changed. The extent to which or the rapid- 
ity with which migrants take on the labor force characteristics of workers in 
places of destination as against places of origin might be more readily studied 
here than elsewhere, because of a relatively large volume of net migration. 
Other hypotheses about the propensity to migration on the part of workers 
in different occupational groups or the influence of unemployement or of 
wage differentials need testing. 

A visitor to California is impressed by the combinations and permutations 
by which California families earn a living. They appear to be more varied 
in this state than in others. If true, this variety may stem from the seasonal 
character of some industries and consequent irregularity of employment or 
indeterminateness in the trends of the state’s economy, as well as other 
forces. But it is not yet clear whether California is on its way to becoming 
an industrialized state or whether its future manufacturing development may 
be limited. A study now in process may answer the latter question but it is 
hoped that the Institute may see its way clear to include some of the prob- 
lems noted in its future research program. 


The Pattern of Age at Marriage in the United States, Vols. I and II. Thomas P. 
Monahan. Philadelphia: Stephenson-Brothers, 1951. Pp. vi, 451. $4.00. 


Paut H. Jacosson, Metropolitan Life Insurance Company 


His statistical study—the author’s doctoral dissertation in sociology at the 

University of Pennsylvania—is the first in many years which is devoted 
exclusively to marriage in the United States. The principal objective was to 
determine the long-term trend in age at marriage, and at the same time to 
throw some light on correlated factors such as occupation, education, nation- 
ality, race, residence, and the law. Toward this end, Dr. Monahan has 
drawn liberally on hundreds of sources, including publications of the Bureau 
of the Census, as well as contributing his own sample tabulations of New 
Jersey marriage records dating back to 1848. 

The author is extremely critical of past studies on age at marriage and 
believes that no conclusions on the long-term trend can be reached from 
available data. To support this thesis, he presents what appears to be inter- 
nally inconsistent evidence. In this reviewer’s opinion, however, the “incon- 





BpoOK REVIEWS 669 


sistency” is due largely to variations in the degree to which the data are re- 
fined in different parts of the book. Thus, in the latter part of the first volume, 
when allowance is properly made for the changing age composition of our 
population, he does find indications of a decline in age at marriage since the 
turn of the century. However, these findings, although in conformity with 
prevailing opinion, cannot be accepted as conclusive evidence of the long- 
term trend, since the data are limited to widely separated periods of time 
for only a few individual states—evidence which is hardly representative of 
the secular trend for the country. 

The author would have done better to approximate the annual age specific 
marriage rates for the country by assembling all data available for the 
period studied. With such estimates it would have been possible to trace the 
total experience of a generation and then to draw conclusions regarding the 
trend in age at marriage. Dr. Monahan dismisses as inadequate the “census” 
method of determining the median age at first marriage (derived from popu- 
lation statistics for all persons ever married), yet the “census” method tends 
to approximate what the generation method would have shown for the period 
since 1890. 

With the population data for the proportions ever married arranged on a 
generation basis, it would be hard to refute the hypothesis, which the author 
refuses to accept, that the age at marriage rose during and immediately 
after the Civil War and that it did not begin to decline again until just before 
the turn of the century. In other words, the trend appears to have been re- 
versed when persons born around 1875 reached the usual age for marriage. 
With proper evaluation and organization of his material, Dr. Monahan would 
have been in a better position to confirm or contradict this hypothesis, long 
current among many researchers. 

No doubt, the literature is “honeycombed with misstatements of fact 
and dubious results,” but in this reviewer’s opinion the author has not done 
much to clarify the situation. The reader will not find the trend in age at 
marriage, nor even of total marriages, in this book. The absence of an index 
and a very sketchy table of contents also detract from its value. Neverthe- 
less, if the book stimulates action to remedy the deficiencies of available 
data on this important subject, Dr. Monahan will have made a lasting contri- 
bution. His bibliography, covering 100 pages, is the most comprehensive pub- 
lished to date, and should prove of interest and value to other investigators 
in this field. 


Design for a Brain. W. Ross Ashby. New York: John Wiley and Sons, Inc., 1952. 
Pp. ix, 260. $6.00. 


A. S. Housenouper, Oak Ridge National Laboratory 


“T HOPE to show that a system can be both mechanistic in nature and yet 
produce behavior that is adaptive.” This is the goal the author sets for 
himself on page 1. The principle upon which such a system operates, however, 





670 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


is “a principle hitherto little used in machines.” The system is called “multi- 
stable,” and it consists of many subsystems called “ultrastable.” 

Before developing the notions of ultrastability and multistability, the 
author discusses the meaning of stability and equilibrium; defines adaptive 
behavior as that which “maintains the essential variables within physiologi- 
cal limits”; distinguishes variables (which define the state of the system) 
from parameters (which describe the situation in which it is placed); and 
introduces a number of special terms. In particular an absolute system is 
defined, and the definition is shown to be equivalent to the condition that the 
behavior of the system is governed by a system of ordinary differentia] 
equations in which time does not appear explicitly. Also functions are classi- 
fied as step-functions, part-functions (with finite intervals of constancy), 
full functions (continuous and having no interval of constancy), and null- 
functions (everywhere constant). 

Now an ultrastable system is defined as “one that is absolute and contains 
step-functions in a sufficiently large number for us to be able to ignore the 
finiteness of the number.” Consider the organism, for the moment, as an 
ultrastable system, subject to some set of external conditions. The system 
may be in a stable equilibrium with its variables undergoing no change; or 
the variables may be changing but within limits; or, finally, at least one 
of the variables may be approaching a level that would be injurious to the 
organism. In an ultrastable system one may expect that before that level 
is reached the system will encounter a “critical state,” at which a step-func- 
tion changes value. It is then as though a new set of differential equations 
takes over, the kinetic properties of the system undergo a sudden change, 
and we have, in effect, a different system upon our hands. In the system thus 
altered, it may happen that now the system approaches a steady state with 
the variables confined between physiological limits. But if not, then a new 
critical point may be reached, at which there occurs another step-function 
change. If eventually, after one or more such changes, the system reaches a 
steady state before a variable actually reaches a level the organism is unable 
to tolerate, then the organism has successfully adapted to its present environ- 
ment. If not it succumbs, or undergoes injury in some degree. 

Though the author speaks of critical points, perhaps it would be better 
to speak of critical regions. The topology is nowhere described explicitly, 
but the diagrams seem to indicate a simply connected finite region, no point 
of which is critical, but outside which every point is critical. One infers also 
that the critical region consists of critical subregions, possibly overlapping, 
each subregion being critical for a particular step-function. 

If there are n step-functions, all independent, in the sense that by knowing 
the values of n—1 of these we cannot infer the value of the nth, then there are 
2” possibilities even if each step-function has only two possible values. If 
it is a matter of pure chance which and how many step-functions change val- 
ues at any time, and if stability is achieved in only one or a small number of 





t 1953 


Lulti- 


_ the 
plive 
logi- 
tem) 
and 
m is 
t the 
ntial 
assi- 


icy), 
null- 


Laing 
: the 
S an 
stem 
2; Or 
one 
. the 
level 
une- 
ions 
nge, 
thus 
with 
new 
tion 
es a 
able 
ron- 


tter 
tly, 
oint 
also 
ing, 


ring 


BOOK REVIEWS 671 


these possible cases, then the animal would probably be dead before it hits 
upon a favorable combination when it is subjected to conditions that evoke 
such changes. Partly to evade this difficulty, partly to account for the fact 
that learning and adaptation generally progress by degrees, the author intro- 
duces the notion of multistable systems, consisting of many ultrastable 
systems. Each ultrastable system contains only a small number of part- 
functions, and it can seek its own equilibrium, in some measure independent- 
ly of the other ultrastable systems which make up the organism as a whole. 
The independence is achieved by linking the ultrastable systems with part- 
functions. Thus if subsystems S; and S; have only the variable z in common, 
and if x is a part-function, then the systems are essentially independent when 
z is constant. 

The argument is verbal and qualitative throughout. An appendix serves to 
give mathematical clarification to some of the notions and to develop a few 
auxiliary theorems, without purporting to constitute a formal demonstration 
of the theses. As a verbal development, it is lucid and persuasive. Numerous 
quotations from the literature in psychology, physiology, protozoology, 
etc. suggest the presence of step-functions and part-functions, and of ultra- 
stability, and otherwise illustrate the author’s argument. 

In principle it should be easy to construct an ultrastable system, and possi- 
bly also a multistable system “in the metal,” and the endeavor is to be recom- 
mended to those interested in robotology. The author, in fact, describes a 
system of the former type actually in existence, and states that a multistable 
system is under construction. These should be interesting to observe and 
might indeed exhibit many of the characteristics of living beings. 

The author, of course, promises only to show that a mechanical system 
can exhibit adaptive behavior. The basic question is, therefore, a very diffi- 
cult probabilistic question. Suppose one has constructed, in metal or on paper, 
an ultrastable or a multistable system. Consider, in probabilistic terms, the 
situations to which it might have to adapt or succumb. We can then ask what 
chance it has of surviving, or rather, what is its life expectancy? Even though 
it may be capable of adapting to any given situation, in the sense that there 
exists an appropriate set of values for its several step functions, what are 
the chances that one such set will be “discovered” before one of the variables 
exceeds physiological limits? 





RANDOM DIGITS (6876-8125) 


From A Million Random Digits, to be published by the Rand Corporation, Santa Monica, California 
Digits 6501-6875 were published in Vol. 48, p. 383 (June 1953) 


48190 75704 88298 15489 16030 
92955 47357 07839 62735 99218 
52319 41690 73298 51108 48717 
32705 46148 12829 70474 00838 
52653 33928 76569 61072 48568 


42480 15372 61781 41665 41339 
25624 02547 30570 58652 49983 
92926 75705 00042 13607 00657 
50385 91711 81077 55715 26203 
36491 22587 90960 04110 66683 


62106 44203 06732 14738 31300 
01669 27464 79553 19056 26225 
76173 43357 77334 75814 07158 
65933 51087 98234 62448 71251 
99001 09796 47349 80395 29991 


08681 58068 44115 40064 43286 
97543 37044 07494 85778 08703 
82763 25072 38478 57782 73100 
25572 79771 93328 66927 49416 
96526 02820 91659 12818 90785 


83642 21057 02677 09367 38097 
69167 30235 06767 66323 78294 
86018 29406 75415 22038 27056 
44114 06026 79553 55091 95385 


53805 64150 70915 63127 63695 


99859 10362 57411 40986 35045 
77644 39892 77327 74129 53444 
25793 14213 87082 42837 95030 
34683 81419 87133 70447 53127 
12147 58158 92124 60934 18414 


52065 91037 44797 52110 08512 
37632 23180 68124 18807 70997 
82576 43164 52643 96363 77989 
69023 92740 95319 04538 60660 
98949 46524 96627 33159 42081 


18991 96526 46326 39923 60625 
21913 33692 67053 03949 70082 
79332 50335 52928 70244 91954 
28982 88321 73797 49494 41878 
08816 90303 21407 90038 72638 


28386 22919 50486 11064 01790 
02303 48642 27882 34206 63132 
17401 92693 45144 32205 49508 
76635 83227 81020 92748 84147 
69692 51599 66831 02754 41731 


58817 86400 66213 74058 44968 
38837 40210 96346 30348 37978 
98425 02451 35423 59557 68318 
79835 94867 41224 67098 64405 
37068 32753 91059 60774 28136 





